Wednesday, November 2, 2016

Statistical process control of color difference data, part 3

By now, I suspect that pretty much everyone has heard about the first two blog posts in this series. Everyone has been talking about them. E! News, Jimmy Kimmel...  There is eager anticipation for the next installment.

The first blog post alluded to some strange goings-on when it comes to statistical process control (SPC) of color difference data. But it was mainly an overview of traditional process control.

The second blog post had some actual data, showing that color difference data does not follow the rules of the game for traditional process control. Color difference data is not normally distributed. And if you don't follow the rules... Well. You kinda shouldn't oughta play the game. Or at least be prepared to be yelled at by an angry lemur.

Lemurs are a bit sensitive about mathematical statistics

In this blog post, I give a somewhat intuitive answer to the big question on everyone's mind. In the next post, I will go way off the deep end and explain it with some <egad> math.

Simple explanation

First, I offer a simple observation. While this doesn't qualify as being a full-fledged theoretical dissertation, it does make one pause and say "hmmmm... there is something strange in this neighborhood." When the time comes around, I will cue you to say that.

For a true normal distribution, there is always a chance - perhaps an incredibly tiny chance - that the values could be negative. But color difference data never goes negative. I think we know that, but the implication is rather broad. Color difference data simply cannot be truly normally distributed. (Cue the Ghostbuster's theme: "If there's something strange, in your ΔE data, who you gonna call? John the Math Guy!")

Groucho demonstrates the absurdity of negative ΔE

One could certainly argue that in some cases, color difference data might be indistinguishable from a normal distribution. Take for example the case where you are determining the color difference between a set of white parts and a set of black parts. The color difference might always be more than 50  or 80 ΔE, so the fact that the color differences can't go negative is inconsequential.

But when we are doing process control of color, one would hope that the color measurements of the products are all kinda clustered around the target color. So the lack of negative color difference values is significant.

Another simple explanation

There is a common spec for spectrophotometers called "repeatability". A more accurate name for the spec might be "the degree to which an instrument agrees with itself in the short term on measurements of the same spot." I have been lobbying the standards committees to change the name, but so far, I haven't got much traction. I dunno, there was some wimpy objection about making tech writer's work too hard. I'll keep pushing the issue.

Repeatability for color measurements is determined by taking ten or maybe twenty measurements of the exact same spot. If someone else is doing the measuring, I prefer twenty. These are averaged to provide a reference color. Then the difference is computed between each of the ten or twenty measurements and the reference color. The reported reproducibility is the standard deviation of the ΔE values. This is known as the Mean Color Difference from the Mean (MCDM) method. It is described in the venerable color science book by the venerables Billmeyer and Saltzman.

I have a bit of a reservation that has to do with the utility of the spec. Years ago, back in the dark ages, photons were scarce and light bulbs varied a lot from shot to shot. As a result, reproducibility was a useful measure of the performance of the instrument. But with today's instruments, this number is normally very tiny. The much more meaningful specs are intra-family agreement (how well two instruments in the same family agree with each other) and inter-instrumemt agreement (how well any two instruments agree). I have written about this before.

In the Dark Ages, the Knights of the Round Table
were reduced mainly to scotopic vision

But the big bugaboo that's bugging me is the boogers caused by one of the spectro manufacturers who has chosen to define their own version of repeatability. I mean, first off, there is a standard out there. How about just following it? But the big bugaboo is that they decided to use the standard deviation of the color difference values, rather than the average.

At first blush, this seems like a reasonable way to look at the variation in a bunch of numbers. I mean, the standard deviation is the tool that I was given in Stats 301 to quantify variation.

But, lemme provide an example where this approach fails miserably. I admit that my example is extremely contrived, and is extremely unlikely to happen in the real world, but still, I think it should make you stop and scratch your head and say to yourself "gosh, is this really the kind of behavior that I want out of a spec for reproducibility?"

Here is the long-awaited and extremely contrived example. Let's say we were determining reproducibility and received the following four measurements. Yes, I know... you should take ten or twenty, but this is an extremely contrived example, ok? Here are the four measurements:

{50, -10, 0},      {50, 10, 0},      {50, 0, -10},      {50, 0, 10}

The average of the four is {50, 0, 0}. If we use the 1976 formula for ΔE, we see that each one is exactly 10 ΔE from this average. Remembering that the final step in computing reproducibility is taking the standard deviation... ummm, let's see... 10, 10, 10, 10... the standard deviation is exactly zero! We have a perfect instrument, with no variation whatsoever!!  Woo-hoo! Of course, any pair of measurements are 14 to 20 ΔE apart, but the spec says this spectro is giving readings that are perfectly reproducible!!

The standard deviation of ΔE data can be horribly misleading!  And I mean "midnight in a graveyard on Halloween" kinda horrible.

I don't know if I mentioned this before, but this is an extremely contrived example. On the other hand, it will hopefully give one pause to consider -- ΔE measurements, and especially the standard deviation thereof, don't follow the normal rules for SPC that we all know and love.

What went wrong?

The problem is illustrated (in general) in the diagram below. If the color values all lie on a circle (or sphere) centered on the reference color value, then the standard deviation of the ΔE values is absolutely worthless as a measure of the dispersion of the color points.

Since with the standard deviation method for determining repeatability, the reference color value is computed from the color values, this contrived situation will only occur if the color values are scattered uniformly on the surface (or near the surface) of that sphere. But if we venture out to situations where the reference color does not come from the data itself...

A less contrived example, in story form

I may have alluded to this before, but that last example was a bit contrived. It would never happen in real life. How about an example that is more realistic? Cuddle up with your favorite blankie and listen to a little story.

Once upon a time, there was a QC person at a print shop who was really into classical SPC, and a press operator at that same print shop who was particularly fond of run-time charts. Below is an example of one of the charts that they ran. The chart shows the adherence to a target color for the yellow solid of thirty samples from a production run. When I say "adherence to target color", I mean the ΔE between the measured color of the solid and the target color. The yellow lines are the upper and lower control limits, and the red line is the customer tolerance. Note that for the sake of simplicity. I am using the 1976 ΔE fomrula, so a customer tolerance of 4.0 ΔE is typical.

(Full disclosure... this is fabricated data. A very good facsimile of real data, but still fabricated.)

The view of the production run for the press operator and QC person

Everything is hunky-dory. All the samples are well within the control limits, so these partners in print conclude that the process is under control. The CpK is 2.44, which is a clear indication that the process is fully capable of providing product that meets the customers requirement. Everyone is happy. The crew manager is bringing in champagne to celebrate, and the boss is talking about big fat bonus checks for everyone. (Did I mention that this is a fairly tale?)

Now, the ink company regularly receives this SPC data from the printer, since they are genuinely interested in seeing the customer succeed. (You can believe that, can't you?) The inkie at the ink company has decided to have a little different look at that same data. (An inkie is a technician at an ink company. Generally affable. Good at Soduko, but not necessarily inclined to play often.) You will note that there is a similarity between this and the picture in the section "What went wrong?"

(Same disclaimer... this data is fabricated. Good facsimile, blah blah blah. But note that the chart above and the chart below are plots of the same fabricated data set.)

View of the production run of the ink QC person

Unbeknownst to the printer, the QC person at the ink company (who only has the best interest of the printer in mind) notices a very clear issue on this press run. Can you see it? All the little bees are lining up to the left of the target color. The pattern (a scattering which is mostly up and down) is typical. It shows that the variation on press is largely caused by a variation in the amount of ink on the substrate. There is an offset caused by a hue shift that is equally large.

Before making any changes, the inkie checks if this is a fluke or a trend. Previous runs all had the same basic problem. The course of action? Henceforth, the inkie decides that henceforth the formulation of the ink will add two drops of beet juice to every kilogram of yellow ink so as to move the color of the ink closer to the target.

Everyone is happy now, right?

Happy? Not quite. The printing plant is in Wisconsin, isn't it?!?!? Happiness is just not part of the culture in Wisconsin.

Every good tale (fairy or otherwise) must have some tension. There must be a plot point where the protagonists engage in conflict. This blog post, by the way, is a good tale, so here comes the conflict. The press operator was never told about the big change to the ink. (Can't you just imagine Jennifer Aniston doing that?) I realize that this never happens in real life, but go along with me here for the moment. The press is fired up and by gosh and by golly, the beet juice hits the fan. Here is the run chart for the first press run with the new ink.

Press operator's view of the press run after the fateful plot point

Put yourself in the place of the press operator. He had already hinted to his wife about the possibility of an all-expense paid trip to Hawaii if things continued going so well. And there is this press run with virtually all of the samples outside of the two yellow lines! The customer's spec is still being met, but jeepers creepers! Someone's gonna get ticked off.

The QC person sees a different problem. The CpK for this press run is 0.57. That's not good, by the way. One would like that number to be larger than maybe 1.3.

Naturally, they decided to blame the ink manufacturer. This is, as we all know, item #2 on the standard operating procedure in almost every printing plant I have ever been in. They called up to complain. The inkie agreed that, yes, a change was made to the formulation. After being thoroughly beaten about the ears for not telling anyone, he pulled the data from the new product run into his spreadsheet.

The plot below came up. The line of bees had gone from being 2 or 2.5 a* values to the left to being about 0.5 an a* unit to the left. Perhaps a bit of overshoot, or perhaps that's just normal variation. Certainly a welcome improvement. The press operator and QC person thanked the inkie, and agreed to continue with the new formula.

Inkie's view of the production run of the new ink

The press operator and QC person were both a bit flummoxed, but had to admit that the inkie was right. The change seemed to improve things.

Finally, the two badgered enough people that they found someone who knew a bit about color science and could counsel them on what went "wrong". The color science person said,

"What?? You should complain if your color error gets better than it used to be??!?! My advice... Ignore the lower control limit for ΔE. ignore the lower limit in the run time chart, or better yet, tell your stinking software not to plot it when you are dealing with color differences. And when you do the CpK thing... same thing. The formula uses both the upper and lower limits, and takes the minimum. Just do the computation based on the upper limit - the one-sided CpK."

That was what the first color scientist (the one with the beard) said. A second color scientist (who naturally also has a beard) pointed out a use for the lower control limit. Brian said that while the lower control limit may not be a trigger point for deciding when to get all up in arms about the process being broken, it is an indication that something changed for the better, so it might not be a bad idea to poke around and figger out what went right. You might want to continue doing whatever that was!

So everyone went home that evening feeling that the crisis had been averted. And we have two rules that will keep us from making this analysis mistake in the future. (Take note of this. I guarantee these will be on the final exam.) Do not use the lower control limit when doing SPC on ΔE color difference data. Modify the CpK computation so that it only considers the upper control limit.

But...

Even with the satisfying conflict resolution in the plot, the QC person didn't sleep well that night. Finally, at 3:00 AM, he/she got out of bed and sat down at his/her laptop to look at the data. After a lot of futzing around in Excel, he/she came to the realization that  the variation in ΔE was much lower before the change was made to the ink. The original data had a standard deviation of 0.24 ΔE. The standard deviation of the color difference of the new data was 0.64 ΔE. This is a problem! Why did the variation in the process jump up like that?

Lemme see... what was item #2 in the SOP? Obviously, the new ink formulation was to blame. So, the inkie got a call at 8:01 AM. "I don't think you stirred the ink enough after you added those two drops of beet juice!" Imagine this said by a person who got like 17 minutes of sleep the night before.

To make a long story just a tad longer, the color scientist kinda person - the first one, you know, the one with the beard - got consulted again. He stroked his beard and then made the following drawing on his chalkboard. (Note that all color scientists have beards, if they're male, anyway. If not, they like Thai food. All color scientists, male or female, have a chalkboard instead of a whiteboard.)

The bearded color scientist demonstrates an uncanny ability to
connect color science with football plays

So, to paraphrase what the bearded color scientist said: standard deviation of color difference values does not work like we expect.

Personally, I can think of exactly one application of the use of standard deviation of ΔE, and that's an example of why no sane person would ever use of standard deviation of ΔE.

I may as well mention...

I hate to go on about my disdain for the use of standard deviation on ΔE values, but I have one more, slightly related thing to get off my chest. Even if this value were reliable, its use can be misleading. SPC is all about using "three sigma" to set control limits. That works for normally distributed data, but as we saw in the previous blog post, ΔE does not follow a normal disturb-ution

Three rules for SPC of color difference data

1. The standard deviation of ΔE is an unreliable measure of the distribution.
2. Ignore the lower control limit when doing SPC on ΔE color difference data.
3. Friends don't let friends compute the standard deviation of ΔE.
4. Modify the CpK computation so that it only considers the upper control limit.
5. Computation of the standard deviation of ΔE? Come on. Just say no.
6. Always proofread for number mismatches.

The saga continues...

The last blog post was all about color difference data not following a normal distribution. I completely ignored (almost) any of that discussion in this blog post. I could have made all kinds of comments about how, since color difference data is not normally distributed, the traditional computation of upper control limit is no longer applicable, and similarly, the whole CpK thing must be rethought.

I ignored the discussion of normality, since the issues I point out today are fundamental, and not directly related to the statistical distribution.

In the next blog post, I want to address the question, "if the data is not normally distributed, then what is the distribution?" At least that's what I want to write about. Who knows what I actually will write about. I am certainly not smart enough to predict my actions!

This blog post was updated on Nov 7, 2016, thanks to Dave Wyble's comments about two bonehead mistakes I made in the original version. First (I can't believe I actually did this) I used the term "reproducibility" instead of "repeatability". I should have known better. I apologize to all who were hurt, psychologically or otherwise, by my blatant misuse of the English language.

Second, I misquoted the definition of repeatabilty that is given in the standards. And I shoulda known better. I just got it confused with the silly definition used by one of the spectro manufacturers.