Reading time ( words)
Statistics are terms used for measurements that describe groups of numbers. The most common averaging statistics are mean, median and mode. Mean and average are often used interchangeably, a statistic with all numbers added together and then divided by the quantity of numbers. The median is the "middle" value in a population of numbers with equal numbers larger and smaller than the median. The mode is the most frequently occurring number on a population. Other common statistics describe variation and include the range and standard deviation. While no single statistic can adequately describe a population of numbers, different statistics will have value across a variety of situations. It is not hard to find an article today where a writer is making a case based on comparison of averages. Averages are quite easy to calculate and most people understand what an average is. The problem is that averages can certainly be misleading. Averaging a billionaire’s net worth with nine homeless people suggests the “average” person in this population has a worth of $100 million. Inordinately large numbers in a population create this misleading message. No wonder the saying, “Do figures lie or do liars figure?”
In the world of fabricating printed circuits, statistics also need to be viewed with a jaundiced eye. Consider an example where two different processing methods are evaluated by looking at one parameter. Dozens of trials are run with each method and the average of the two methods look as shown in the image below.
The average of one method was 77 (red x) and the second method was 83 (black x). These are theoretical numbers, but let’s assume the larger the number, the better. If one was going to make a decision based on averages alone, one would probably choose the 83. Let’s imagine that the current process was the one in red (77) and an engineer is making the case to switch to the process represented by the black x (83). In many cases, changing process methods can be very expensive; there is the cost of internal testing as well as the cost of getting approval from the customer. Process changes can involve new equipment and training. In most cases, management would want to make sure that changing to a different method can be cost justified.
In reality, this average only gives a partial view of the situation and in some cases it may not be providing useful information. If we were to take the data that generated the two averages in question and create a histogram, we might end up with a graph looking like the one below in Figure 2, where the red results and black results are plotted together.
The data in Figure 2 generated the averages in Figure 1, but in looking at the distributions, the case to switch to the new process does not look nearly as convincing. The variation of the numbers suggests the calculated “averages” is an incomplete description of the populations.
There is a statistical way to determine if the two methods actually result in different outcomes, or if we are just seeing a lot of noise. The statistical method is known as the student’s t test and it uses a formula that includes the standard deviations, averages and number of data points for each group. A first step in a statistical test is to create a “null hypothesis” (i.e., we assume these numbers are NOT from the same group). The observed difference in value is not statistically significant and any observed difference is random error. The student’s test would ultimately yield a confidence number (i.e., 95% confidence the two groups actually are different). In the above case it is doubtful that there would be a strong probability (confidence percentage) the two groups are statistically different.
The above is a simple example to show that variability of data is a critical component of any statistical evaluation. The larger the variation, the less precise predictions can be made about the outcomes.
Consider two hypothetical groups that had the same averages as above, but distributions as in Figure 3 below. If the higher result is desirable, then the black x with an average of 83 looks more desirable. But as shown below the red group has a significantly tighter distribution (less variation). So which would be the best choice if one had to pick one?
In most cases, the tighter distribution would most likely be the better choice. Consider the system where there are a number of process steps (similar to the sequence of processes required to produce a printed circuit). The output of the previous step is the input to the next step. The receiving process must deal with the variability of the input, so even if the average is closer to ideal, the next process may be required to compensate with many parts from the input distribution. It does not make sense for the process to be set up strictly based on the average of the inputs; it must be able to process the entire range. The variability of the inputs into a process will affect the variability of the outputs. In a sense, variability gets “amplified” as it goes through the system, creating issues for subsequent process steps downstream.
Most of the costs of process variability are hidden, as each process subtly compensates for the variability, while generating more variation. Traditional methods of costing variation only account for the percent product that theoretically is outside the specification limits. Genichi Taguchi helped Japan rebuild an industrial base after WW2 and is known for contributing to quality improvement through variability reduction. His contributions fall into three major categories:
- The Loss Function: Taguchi developed a mathematical equation to quantify the value of a product due to a decline in quality. This helps quantify revenue loss because of variability in a production process, and can be used to project the benefits of quality improvement. The inter-relationship between quality and cost was first quantified by Taguchi.
- Orthogonal Arrays: Taguchi used orthogonal arrays to analyze and isolate “noise factors” from significant effects in designed experiments. These methods remain popular in optimizing the performance of industrial and manufacturing processes.
- Robustness: The ability of a process to perform as intended despite influences that are uncontrollable allows a process to perform “robustly.” Again, Taguchi methods are used to create processes that are predictable and controlled across a wide set of input variables.
Statistical techniques combined with designed experiments and sprinkled with common sense can go a long way in revealing sources of variation. Controlling variation can be costly, but it isn’t as costly as not controlling it. I am quite certain Mr. Taguchi would agree!
Dave Becker is vice president of sales and marketing at All Flex Flexible Circuits LLC.