## Tuesday, October 18, 2016

### Statistical process control of color difference data, part 1

Statistical process control (SPC) of color data -- specifically of color difference (ΔE) data -- can be done, but there is a bit of a twist. Color difference data doesn't behave like your garden variety process control data. Since ΔE doesn't follow the rules, the classical method for computing control limits will no longer work.

In this blog post, I review classical process control to provide a footing for next week's blog, where I pull the rug out from under the footings of the classical approach; explaining why it won't work for color difference measurements. Hopefully, by the time I get around to the third blog post in this trilogy, I will have thought of some new footings on which to erect a new SPC specifically designed for ΔE.

Process control - Do we have an outlier?

Review of process control

The premise of statistical process control is "more or less simple". I say that in the sense that it's not really that simple at all. And I say that because I want to make sure that you understand that what I do is really pretty freaking awesome. But really, the basic idea behind SPC is not all that tough to comprehend: You only investigate your widget-making machine when it starts to produce weird stuff, and you shouldn't sweat it when the product isn't weird.

The complicated part lies in your algorithm for deciding where to draw the line between "normal" and "weird". The red dress on the far left?  Elegant, chic, and attractive, and pretty much in line with what all the women at my widget factory are wearing. The next one over? Yeah... I see her in the cafeteria once in a while. But I'm just not getting into the outfit on the far right. Sorry. I'm just not a fan of horizontal stripes. But in between... how do you decide where to draw the line?

Where to draw the line????

Statistical process control has an answer. You start by characterizing your process. As you manufacture widgets, you pull out samples and measure something about them. Hopefully you measure something that is relevant, like the distance between the threads of a bolt, or the weight of the cereal in the box. Since you are (apparently) reading this blog post, it would seem that the widget's color might be the attribute that interests you.

Next, you sadistically characterize this big pile of data. Open up a spreadsheet, and open up a bottle of Black and Tan, a Killian's Red, a Pale or Brown Ale, a Blue Moon, or an Amber Lager. And unleash the sarcastical analysis.

The goal for your spreadsheet is to come out with two numbers, which we call the upper control limit and the lower control limit. Then when you saunter into work the following day, after recovering from a colorful hangover, you can start using these two numbers on brand new production data. Measure the next widget off the production line. If it falls between the lower control limit and the upper control limit, then relax and pull another Black and Tan out of your toolbox. You can relax cuz you know your process is under control.

The yellow crayon is just a few nanometers short of a full deck

When a part falls outside the control limits, the camera doesn't automatically cut to Tom Hanks saying "Houston, we have a problem". We're not sure just yet whether this is a real problem or a shell-fish-stick anomaly. The important thing is, we start looking for Jim the SOP Guy, since he is the only one in the plant who knows where to find the standard operating procedure for troubleshooting the widget making machine.

Note that I was careful not to start the previous paragraph with "when a part is bad..." Being outside of control limits does not necessarily mean that the part is unacceptable for the person writing out a check for the widgets. Hopefully, the control limits are well within the tolerances that are written into the contract. And hopefully, the control limits that are used on the manufacturing floor were based entirely off data from the process, and the SPC code of ethics has not been sullied by allowing the customer tolerances to be used in place of control limits. That would be icky.

Identifying control limits

But how do we decide what the appropriate control limits are? If we set the control limits too tight, then Jim the SOP Guy never gets time to finish the Blue Moon he opened up for breakfast. And we all know that Jim gets really ornery if his beer gets warm.

You don't want to get Jim the SOP Guy angry!

If on the other hand, we humor Jim the SOP Guy and widen the tolerances to the point where Tom Hanks can fly a lunar lander through them, then we will potentially fail to react when the poor little widget making machine is desperately in need of a little TLC.

So, every time we encounter another measurement of a widget, we are faced with a judgement call. Setting control limits is inherently a judgement call where we balance the risk of wasted time troubleshooting versus missing a machine that's out of whack.

Deming

Why is it so bad to spend a little extra time troubleshooting?  It is, of course, a business expense, but there is an insidious hidden cost to excessive knob gerfiddling. It makes for more variation in the product. If we try to control a process to tighter than it wants to go, we just wind up chasing our tail.

Well, lemme tell ya about when I worked with Deming. This was back in the late 1940's, just after the Great War to End All Wars. Oh wait. That was WW I. Deming did his stuff just after WW II - the Great War After the Great War to End All Wars. I was about negative thirteen years old at the time. A very precocious young lad of negative thirteen, I was. Deming learned me about the difference between normal variation and special cause. Normal variation is the stuff you can expect with your current process. You can't get rid of this without changing your process. Special cause means that something is broke and needs attending to.

Try this joke at home with Riesling and with Kipling!

Deming traveled to Japan after the war to help rebuild their manufacturing system. He did that very well. I mean, very well. Deming became a super-hero for the Japanese in much the same way that I have become a super-hero for my dogs. Except, of course, that the Japanese came to revere Deming.

In a nutshell, Deming preached that all manufacturing processes have a natural random variation. We should seek, over the long run, to minimize this by improving our process. This is important, but it is not the topic of this blog series. I want to concentrate on the day-to-day. In the short run, we need to understand the magnitude of our variation. This is done by collecting data, and applying statistics to it. This is used to identify subsequent parts that fall outside that range. When this happens, there is a call for identifying the special cause, and correcting the issue.

A part is identified as being potentially bad if it is so far from the norm that it is unlikely to have come from the same process. This is important enough to repeat. A part is identified as being potentially bad if the probability of it falling within the established statistical distribution of the process is very small. So, it's all about probabilities.

Enter normality
If  we assume that the underlying distribution is "normal" (AKA a Gaussian or bell curve), then we can readily characterize the likelihood of a part being bad based on the mean and standard deviation of the process. In a normal distribution, 68% of all samples fall within 1 standard deviation of the mean, 95.5% fall within 2 standard deviations of the mean, and 99.74% fall within 3 standard deviations of the mean.

Folks who have taken credit for DeMoivre's invention
So...

The characterizing of our process is pretty simple. You know, when you opened up the spreadsheet and took a long drink of the Amber Lager?  You don't have to tell your boss how simple it is, but here it is for you: Compute the average of the data. That goes in one cell of a spreadhseet. Compute the standard deviation. That goes in a second cell. Then, multiply the standard deviation by the magic number 3. Subtract this product from the mean (third cell in the spreadsheet), and add this product to the mean (fourth cell). This third and fourth cell are the lower and upper control limits, respectively.

If the process produces normal data, and if nothing changes in our process, then 99.74% of the time, the part will be within those control limits. And once every 400 parts, we will find a part that is nothing more than an unavoidable tansistical anomaly.

The big IF

Note the sentence that predicated assigning the numbers to the likelihood of false alarms: If the underlying distribution is normal...

Spoiler alert for next week's blog post. Color difference data is not normal. And by that I mean, it doesn't fit the normal distribution. This messes up the whole probability thing.

Sadly, differences of color don't live in this city!

Here is a scenario that suggests there may be a difficulty. Let's just say for example, that the average of our color difference data is 5 ΔE, and that the standard deviation is 1 ΔE. That puts our lower control limit at 2 ΔE.

Let's say that we happen to pull out a part and the difference between its color and the target color is 1 ΔE. What should we do? Classical control theory says that we need to start an investigation into why this part is outside of the control limits. Something must be wrong with our process! The sky is falling!

But stop and think about it. If the part is within 1 ΔE of the target color, then it's pretty darn good. Everyone should be happy. Classical control theory would lead us to the conclusion that something must be wrong with our process because the part was closer to the target color than is typical!

The obvious solution to this is that we simply ignore the lower control limit. That will avoid our embarrassment when we realize that we fired that incompetent operator for doing too good a job. But, this simple example is a clue that something larger might be amiss. Stay tuned for next week's exciting blog post, where I explain how it is that color difference values are really far from being normally distributed!

Move on to Part 2