John the Math Guy: statistical process control

Showing posts with label statistical process control. Show all posts

Wednesday, November 9, 2016

Statistical process control of color difference data, part 4

Warning: If you were considering whether to jump off the diving board into this blog post because this series is getting too deep mathematically, then I would suggest for you to get off at this stop and wait the next bus. On the other hand, if you were one of the hardcore color geeks who has been chafing at the bit for me to get to the meat of the matter, then read on, read on. If you get all the way to the end of this blog post and find yourself saying "yeah... that all makes sense", or even "John is completely wrong on this!" then I commend you. I hope to get a chance to have a beer with you.

There have been a number of comments on the previous blog posts (primer on SPC, deltaE is not normally distributed, and anomalies with standard deviation) from hardcore color geeks. All of these comments are from smart color scientists. Please note that smart color scientists either have beards, or are named Dave. Unless, of course, they are female, in which case they like Thai food.

David MacAdam, Albert Munsell, and Deane Judd
All sporting beards!

Dave Wyble suggested looking at the variance on the individual components, ΔL*, Δa*, and Δb*. "Conventional wisdom says [component differences] will be normally distributed..." Come on, Dave. When was wisdom considered "conventional?"

(Color scientist of the world today trivia fact number seventeen: Dave does not have a beard at this time. He has, however, had one at various times in his life.)

Max Derhak anticipated this blog post in one of his comments: "Isn't this a Chi-squared distribution?"

Steve Viggiano commented: "Having advocated Hotelling's t-squared for this application for decades, I am interested in where this is going." He also commented on the distinction between deltaE and deltaE squared, and the unrealistic assumptions baked into the chi-squared distribution. I will address these directly.

These folks are all hardcore color geeks. Max and Steve both have beards, by the way. Come to think of it, I also have a beard. Therefor I must be a smart color scientist... a smart color scientist who is not so good with syllogisms.

The question on the chalkboard today is this:

Do color difference values, as measured by the square of the ΔEab values, have a chi-squared distribution? If not, then what is it?

Literature review

Several sources have suggested that ΔE has a chi-squared distribution. The earliest I have found is from Fred Dolezelak [1]. He looked at measurements from 19 print runs and came to the following conclusion:

"[The ΔE values] followed a chi-squared statistic, characterisable by a single parameter, which could be linked to the standard deviations in L*, a*, and b* space."

In the Appendix of his paper, he referred to a previous paper, by H. G. Volz, which demonstrated this. I took a cursory look for the paper, and didn't find it. I admit to not trying real hard. It's in German. Unlike most color scientists, I don't read German.

Dolezalek's paper included the following graph, supporting his claim. Note that this is technically known as a funky-graph because of the nonlinear scaling on the y axis. This crazy scaling is designed so that data from a chi-squared distribution will plot out as a straight line.

One set of data, from Dolezalek's paper

All in all, this is not a bad fit to the data. But note that it can be seen to fall off at the right end. The four points at the end (the four points with highest ΔE) are all below the line. This is a sign that perhaps the curve fit is not ideal. The fact that the curve fit starts falling off at the 80th percentile is bad juju for SPC. For SPC, the upper control limit is conventionally set to the 99.75 percentile.

I will amend Dolezalek's statement accordingly: "[The ΔE values] followed a chi-squared statistic, but not in the area where SPC needs!"

The next reference I found is from an unpublished document by a highly respected friend of mine, Dave McDowell [2]. Kodak produced a test target which may be familiar to my readers - around 700,000 of these were manufactured. Kodak wanted these to be produced to tight tolerances, so they were very rigorous about their process control. They found the chi-squared statistic suitable for their needs.

Kodak QC60 target

Here is what Dave said:

"The quantity 'deltaE/2 - avg' when squared follows the chi-squared distribution...

"Evaluation of a large number of samples of the Kodak Q60 transmission and reflection targets showed that the deltaE characteristics of individual samples compared to the batch mean followed this same statistic."

Before I go on, I want to highlight the "when squared" part that Dolezalek had inadvertently missed. The metric that theoretically has chi-squared distribution is not ΔE, but rather ΔE squared. I am sure that Dolezalek was aware of the fact, since his results were at least reasonable. It was probably an unintentional omission on his part. Steve Viggiano reiterated this in a comment to part 2 of this series.

Unfortunately, McDowell did not provide any data or plots by which we can assess the strength of his statement. He has provided me with the data, however. Winter is coming in Wisconsin. Plenty of time for me to curl up with my laptop and savor the 150 files. (It goes without saying that Dave does not have a beard.)

(The text that was quoted above later appeared in a white paper from Kodak [3]. This makes sense, since McDowell worked for Kodak at the time.)

Steve Viggiano is another highly respected friend of mine who has weighed in on this topic, again in an unpublished document [4]. Viggiano was the first (to my knowledge) to articulate the precise conditions that must be met in order for the chi-squared function to be applicable for ΔE. I don't mean to say that Viggiano invented these criteria - they go with whoever invented the chi-squared function. I mean to say that Viggiano was the first to assert these criteria for the distribution of color difference values. Following Stigler's law of eponymy, I will refer to these as Viggiano's criteria.

More on these criteria in the next section. A lot more. I might add that this blog post would not have been possible without Steve's prolific pursuit of pedantic pleasures. He knows this stuff better than I ever will.

Moving along to additional chi-squared sightings, the ASTM standard E 2214-02 [6] has a brief mention of the distribution. It states the following: "As observed in Fig. 2, the mode, median, and mean of a set of color difference (ΔE) determinations do not follow a bell curve but a curve related to the Chi-squared or F statistical distributions..." There is unfortunately no further explanation.What is the relationship? When is F applicable?

I know of one additional reference that discusses the statistical distribution of ΔE values, a paper by Maria Nadal et al. [5]. In their paper, they compared various methods for determining the 95th percentile of color difference data.

Why the 95th percentile? Two of the authors of this paper are from NIST. For a fee, NIST makes official color measurements, usually of colorimetrically stable objects such as Lucideon tiles (previously known as Ceram tiles. which were enshrined in the literature under the name BCRA tiles). As part of their service, they assign confidence intervals to the measurements. They are required to report the 95% confidence interval.

This is very useful analysis and a well thought out paper, since there is a dearth of technical information on real-world evaluation of the distribution of color difference data. But, the analysis in Nadal et al. is not necessarily directly applicable for our goal, however. For SPC, one needs the 99th, or preferably, the 99.75th percentile. Our problem is more difficult, since we are interested in the shape of the distribution way out in the tail.

Putting our quest in perspective

What is the chi-squared distribution?

Let's say that you take a bunch of random data - values sampled from a distribution - and then add them in quadrature. (This is a fancy phrase meaning that you combine them with the Pythagorean theorem, which is to say, you square each one, add up the squares, and then take the square root of the sum.) The result is a chi-squared distribution, provided the Viggiano criteria have been met. (Yes, I will get to those criteria. Don't rush me!)

The chi-squared distribution is not just a single distribution; it is a family of distributions. The members of the family are distinguished by the number of random variables that were added together to get that distribution. We refer to any of the family as a chi-squared distribution with n degrees of freedom, where n is the number of random variables that were summed in quadrature.

Once the number of degrees of freedom has been decided upon, there is only one parameter left. This parameter accounts for a scaling of the distribution along the ΔE axis, and is dependent on the standard deviation of the distribution from which the random variables have been taken.

Note that the ΔEab formula is equal to the sum in quadrature of the differences in each of the colorimetric components: ΔL*, Δa*, and Δb*. So, it is at least a potential candidate for the chi-squared function with 3 degrees of freedom.

For the distribution of ΔEab squared to be chi-squared, the distributions of the three components must satisfy all four of the following criteria (as recited by Viggiano):

1. They must all have zero mean.

2. They must all be normally distributed.

3. They must all have the same standard deviation. (ΔL* can't dominate, for example.)

4. They must be independent.

Important plot point: If we can ascertain that ΔE squared follows the chi-squared distribution then finding the 99.75 percentile would be a simple matter of arithmetic applied to the mean.

Does the TR 002 data fit the chi-squared distribution?

I tested the TR 002 data to see if the chi-squared distribution thing worked. The data is measurements of 928 different CMYK patches as printed on 102 newspaper presses throughout the USA. (The data set is further explained in a previous post in this series.)

I first computed the average L*a*b* values for each of the 928 patches. Each of these averages was thus representative of what would be printed on the 102 different newspaper presses. For each of the 928 patches, I then computed the color difference, in ΔEab, between this average and each of the 102 measurements. This gave me a collection of 102 X 928 color difference values. More than enough to wallpaper my kitchen.

A comment here: My dog asked me why I didn't use ΔE 2000, rather than ΔEab. He felt that it would be more useful to use the latest and greatest color difference formula. Presumably this is the formula that will get the most air time in the future. I agreed with him (he is pretty smart, as dogs go), but explained that, for purposes of the initial investigation in this blog post, I prefer to use ΔEab. If all of the Viggiano criteria hold true, then the square of ΔEab would precisely fit the chi-squared distribution with three degrees of freedom. If this part holds true, then the next step would be to see if ΔE 2000 was reasonably close. (A bit of foreshadowing: My conclusion, as we shall see, is that even ΔEab did not fit this model.)

I generated cumulative probability density functions (CPDFs) for each of the 928 patches. I scaled each of the CPDFs by their mean. In this way, they all had a mean of 1.0 ΔEab. Bear in mind that if the variations truly follow a chi-squared distribution, then all of these would have the same shape, but just different scaling in the x axis. So assuming that all four criteria are met, these color difference values should all be from the same distribution. Therefor I can combine them into one CPDF. This gives me the advantage of having 928 X 102 = 94,656 data points, so the resulting CPDF will be relatively smooth.

To check the assumption of chi-squaredness,, I used a Monte Carlo method to generate the CPDF of the hypothetical distribution if all the Viggiano criteria hold. I generated 94,656 sets of hypothetical variations (in ΔL*, Δa*, and Δb*), each of which drawn from a normal distribution with mean of 0 and standard deviation of 1.0. I then went through the ΔEab formula to generate hypothetical color differences. Finally, this collection of values was normalized to have a mean of 1.0 so as to best match the distribution computed from the real TR 002 data.

Generating a plot of a CPDF from a collection of values is a rather easy task, by the way. (Thank you for asking how it is done.) First, the data is sorted from smallest to largest. Next, a second array of the same length is generated with values incrementing in fractional steps from 0 to 1. This incremental array is then plotted as a function of the color difference array. Note that this is kinda the reverse of the way we would normally plot.

The plot below shows a comparison of the two CPDFs. The blue line is the actual data, and the red line is the chi-squared distribution with three degrees of freedom. Both have a mean of 1.0. Gosh. They look different.

Do real color difference values follow a chi-squared distribution?

The above plots can be differentiated to create a probability density function, as shown below. As expected, they are not as smooth as the CPDF curves, but it is still clear that the two distributions are dissimilar. Real color difference data us more skewed to the left and has a longer tail to the right than the distribution based on the chi-squared function with three degrees of freedom. I put that in italics, since it kinda sounded like an important conclusion.

Clearly the distribution of real color difference values does not follow a chi-squared function with three degrees of freedom. What went wrong? One or more of the Viggiano criteria must have been missed. Let's look at the criteria one at a time.

Zero mean

Do the differences ΔL*, Δa*, and Δb* all have zero mean? In general, this depends on how a given data set is compiled. I see three general cases:

In the first case, the data itself is used to define the target color. In a Mean Color Difference from Mean (MCDM) scenario, the target L* value is the average of the L* values of the data set, and similarly for a* and b*. In this case, the ΔL*, Δa*, and Δb* have zero mean by design. This is the official formula for determining the repeatability of a spectrophotometer. This was also the case in the analysis of the TR 002 data set that was presented in the second post of this series, and in the previous section.

A second case is described in Nadal, et al. They talk about looking at all pairwise color differences in the data set. For example, in a set with ten color values, the color difference would be determined between the first sample and each of the nine other samples, between the second sample and each of the eight others, and so on. For ten color values, there are thus 45 different unique combinations.

It is not immediately obvious, but this case also has zero mean for ΔL*, Δa*, and Δb*. Consider the case where all pairings are considered, not just unique pairings. That is to say, one computes ΔE (sample n, sample m) as well ΔE (sample m, sample n). Noting that ΔL* (sample n, sample m) = - ΔL (sample m, sample n), it is easily seen that for every pairing, there is the reverse pairing which balances it out. Thus, if one considers all non-unique pairings, the three colorimetric components all have zero mean.

Since ΔEab is one of the commutative color difference formulas, the color difference for a pairing is the same as that of the reverse pairing. Therefor, the distribution of ΔEab is the same for the case of all pairings and all non-unique pairings. I would argue then that the first criterion is met with the case presented by Nadal, et al.

Both of these are (somewhat) unnatural cases, which is to say, not generally encountered in the SPC of color. Generally speaking, the target color has been externally specified, typically by the customer of the product. Although the process has been adjusted to come close to this color on average, this will never be exactly the case. It is not atypical for the average color of a print run to be a few ΔEab from the target color. Therefor, at least in the SPC of color in printing, Viggiano's first criteria is rarely met. Remember that the whole gist of this series is SPC?

With SPC, we would therefor expect that the violation of the "non-zero mean" criteria will have a major effect on the distribution of color difference values. Viggiano anticipated that I would write that in my blog one day, so he mentioned that if the zero mean criteria is violated, then color difference will follow a non-central chi-squared distribution.

Thus, when SPC is performed on color difference data, the non-central chi-squared distribution is the appropriate choice (assuming any of the chi-squared distributions are appropriate). Once again, I used italics cuz this sounded important.

Normal distribution of components

Are the distribution of ΔL*, Δa*, and Δb* normal? Gosh. This is not a simple question. On the one hand, it seems like a reasonable assumption, At least as a starting point.

On the other hand, the values L*, a*, and b* are computed through a nonlinear transform of X, Y, and Z, so strictly speaking, either XYZ or L*a*b* could be normal, but never both. But then again, the nonlinearity is small when compared to the typical variation, so the effect is probably not of practical importance. (This statement relies on a simple rule - every smooth function looks like a line of you look at a small enough piece of it.)

But, on the third hand, there are certainly conditions where the variation of L*a*b* is distinctly not normal. For example, if the process has a pronounced drift, the variation could resemble a uniform distribution. If the process has a sinusoidal fluctuation then the distribution will be a U shaped distribution with asymptotes corresponding to the two extremes of travel. I once figgered out what the formula for that distribution is. I fergit just now what my answer was.

I will argue that, with good SPC, care is taken so that the process is in known, good working condition.before initial characterization. This, of course is not always the case. The idealist side of my brain would put my foot down and say "WHAT!!?!?!? Gosh darn it to heck! If the process has an oscillation or drift, then in the name of peanut butter and jelly sandwiches you better fix it before you do any characterizing!!" But the realist side of my brain is willing to admit the possibility that some (if not all) processes have drift and/or oscillation that cannot be eliminated.

I generally avoid having the idealist and the realist sides of my brain in the same room together. It avoids a lot of arguments. But in this particular case, the tiny tiny portion of my brain that has some modicum of mediation skills was able to come to a compromise that was suitable for both of the other sides of my brain. If the process is capable (in the SPC sense of the word) of providing product that is within the customer tolerance, then does it matter if there is oscillation or drift? If it costs money in the long run to get rid of that anomaly, then one has to weight this against another customer requirement - price.

Getting back to the matter at hand, I am going to start with the bold assertion that when the process is in good working condition, the variation will likely be close to normal. But I will follow up with a little investigation to back this up.

I tested the TR 002 data to see if the variations in the colorimetric components had a normal distribution. My testing amounted to analysis of skewness and kurtosis.

-- Skewness

The skewness was computed for each of the 928 patches and each of the 3 colorimetric components. This gave 2,784 values for skew. Each value of skew represented the distribution of a different set of 102 samples.

The graph below shows all the values of skew.

Skew of ΔEab values

What to make of this? There are various rules of thumb about how much skew is of practical importance. Perhaps a skewness with absolute value greater than 1 or 2 is considered to be significant? By that rule of thumb, skew is not a problem.

A more precise test comes from a table in my favorite statistics book [7]. According to the table, if I were pulling 100 samples from a perfectly normal distribution, I would expect that 2% of the time, I would have a skewness above 0.567 or below -0.567. In this data set, I saw that 16.6% of the skewness values were above this number.

So, I can say with statistical certainty that an inordinately large number of the data sets are skewed. Oddly, about two thirds of these are skewed positive and one third is skewed negative. I don't know why that is! Why would the variation of some colorimetric components of some of the patches go positive and others negative? Without careful examination of the data, I would guess that there are some outliers? I might even go so far as to guess (rather oxymoronically) that there are an appreciable number of outliers. Someday when I run out of beer in the fridge, and run out of reruns of Get Smart to watch, I will look at that.

Then the realist side of my brain kicks in. "Really? Did you happen to notice that there are only seven cases where the skew is greater than 1.5??!? What are you smokin'? That's not a lot of skew!!!"

--Kurtosis

Kurtosis is the other popular measure of a data set that can be compared to that of the normal distribution to test for normality.

I compared the kurtosis values against the 1% ranges for kurtosis (also from [7]) , and found a similar situation. 6% of the data sets showed a statistically significant deviation from the normal distribution, with some being leptokurtic and some be platykurtic. (Man! I love those words!!)

Without a whole lot of guidance, I am going to jump to the wild conclusion that the kurtosis is not of practical significance.

--Conclusion of normality analysis

So, my conclusion is that there is a statistically significant number of color difference measurements that have skew and/or kurtosis. Some go one way, and some the other. But from a practical standpoint, the skew and kurtosis are not appreciable. It is unlikely that this Viggiano criteria that has been violated.

Equal standard deviation and independent

In the beginning of this blog post, I stated that I would take the Viggiano criteria one at a time. A lot has happened since then, and I have changed my mind. I have lumped these two criteria together, since they are symptoms of the same illness. If the illness is present, it will manifest in one or the other of the criteria, or perhaps both. I believe this analysis is unique to the color science world, but perhaps not among chi-squared-ologists.

To help understand this, consider the two plots below. For the plot on the left, I generated pairs of x and y values from a random number generator that gave mean of 3.0. For the x values, the standard deviation was 0.1, and for the y values, the standard deviation was 0.7. The plot on the right is that same data, but rotated about {3, 3} by 45 degrees.

The data set on the left violates Viggiano criteria #3, since the ratio of the standard deviation in y is 7 times that of x. On the other hand, there is no correlation between the two axes. In other words, it passes criteria #4, but fails criteria #3.

In the data set on the right, the standard deviations of the x and of the y values are practically the same, so criteria #3 has passed. On the other hand, the correlation coefficient between the x and y values is 0.958. With such a strong correlation, the data on the right clearly fails at criteria #4.

From the standpoint of the problem at hand, the two data sets are identical, so the conclusions should be identical. In the one case a reasonable test is to look at the ratios of the standard deviations. In the other case, the correlation coefficient seems like a likely test. This begs the question about how to equivalently evaluate the two criteria, and how to catch the situation where the "illness" has been apportioned to both of the criteria.

-- Interlude into ellipsoidification

I have developed a technique that is appropriate here. I mentioned the technique in a blog post about the color red back in January of 2013. I dubbed it ellipsoidification. I have not seen this described elsewhere. Certainly the word is novel, if not the technique.

Ellipsoidification is an extension of the standard deviation to multiple dimensions.

[Note to those who have committed ASTM 2214-02 [6] to memory. Section 6.1.1 describes an extension to standard deviation to multiple dimensions. While the method 2214 is similar - perhaps even very similar - it just misses a truly beautiful result.]

Ellipsoidification produces a vector with one standard deviation value for each dimension of the data. In the case of uncorrelated coordinates (as in the example above on the left), the multi-dimensional standard deviations are equivalent to the standard deviations of each of the individual dimensions (i.e. stdev (x), stdev (y), etc.).

For data that is correlated in one or more dimensions, the results are the same as if you first rotated the coordinate system so as to make all the axes independent, and then took the one dimensional standard deviations of each component individually. In the example above, the plot on the right would first be rotated by 45 degrees, and the the standard deviation would be computed on each of the axes. Thus, the two plots would have identical multi-dimensional standard deviations, as we would hope.

That was one explanation of the technique.

Here is another explanation of the multi-dimensional standard deviation. Say we have a set of one-dimensional data. It is a set of points scattered along a line. The standard deviation is a line whose length is somehow representative of the amount of spread of the data points. In two dimensions, we do that same trick, but in this case, we are finding the ellipse which best describes the spread of data in two dimensions. In a sense, we "fit" an ellipse to the data.

In the case of the example above at the left, the ellipse has the major axis straight up and down and the minor axis is right to left. The major axis has length of 0.1 and the major axis has length of 0.7. In the example at the right, the ellipse has the same major and minor axes, but it has been tilted by 45 degrees.

Below I have an actual an example of an ellipsoidification caught smiling for the camera. This is a two dimensional one, since my three dimensional display is at the cleaners getting its nails done. All those little black points that look like data points are data points, generated by a random number generator with normal distribution.

The red ellipses are the one, two, and three sigma ellipses generated from that random data set. I should point out that, although we are dealing with a normal distribution, the one, two, and three sigma probabilities that we memorized for the third grade stats class (68%, 95.4%, 99.75%) no longer apply. In the case below, we are looking at a two-dimensional normal distribution.

I think this is a pretty cool plot

If we take that to three dimensions, as for L*a*b* data, we are fitting an ellipsoid to the scattering of the data points, Hence the term "ellipsoidification". The multi-dimensional standard deviations are the lengths of the three axes of the ellipsoid. Note that a by-product of ellipsoidification is the orientation of the ellipses, which could be useful. Note that I used this to properly tilt the ellipses above. On production data, the orientation of the ellipsoid can help point to the major cause of the variation.

That was the second explanation.

The first explanation was for the logophiles, that is, mainly in terms of words. The second explanation was for the pictophiles, that is, mainly in terms of a visualization in one awesome graph. The third explanation is for the folks who might actually have to do something with this mess. It is the most useful, because it is algorithmic. It is also, perhaps, the most difficult to understand. At least for me. I dunno, Viggiano probably stared at that plot above for a few seconds and figgered out the algorithm.

(Color scientist of the world today trivia fact number twelve: Steve Viggiano and John the Math Guy went to different high schools together.)

Here's the third explanation, very terse and incredibly dense: Ellipsoidification is by a complex method involving rotation of coordinate systems to achieve uncorrelated data.

I think this is a novel finding.

-- Now back to our data analysis, which is already in progress...

I applied ellipsoidification to the data in the TR 002 data set. For each of the 928 patches I got three values -- the lengths of the three axes, or in different words, the standard deviations in each of three directions.

The real question for us (remember back to criteria 3 and 4?) is not the magnitude of the axes of the ellipsoid, but rather the relative magnitudes of the three axes. If they are all close to the same size, then the appropriate number of degrees of freedom for the chi-squared function is three. If two are the same size and one of them is zero, the there are two degrees of freedom. But how big a difference in magnitude could be considered large enough to lose a degree of freedom?

As an example, let's say that largest axis of the ellipsoid has a length of 1, and another axis has a magnitude of 0.333. By using the Pythagorean theorem on this example, the effect of that second dimension is only a 5% increase in ΔEab. Let's make a rule that this is considered negligible. By this somewhat arbitrary rule, if the shortest axis is less than a third of the length of the longest, then a three dimensional variation is effectively a two dimensional variation. If two of the axes are less than a third of the length of the longest, then the variation is effectively one dimensional, that is, along a line.

This gives us a method by which to quantify the dimensionality of the variation in the TR 002 data set. I determined by this rule that

46.6% of the patches have three dimensional variation,

40.2% of the patches have two dimensional variation, and

6.4% of the patches have one dimensional variation.

Wow. I think this makes sense. I would guess that the variation in the color of a patch is largely due variation in the amount of ink transferred to the substrate, or variation in the dot gain of the ink. If a patch has predominantly only one ink, then I would expect that the variation is largely in the direction of the trajectory of that ink, that is to say, the variation is largely one dimensional. Similarly, two inks would have two dimensional variation, and three or more would have three dimensional variation. It would be an interesting research project to look at individual patches and see if the dimensionality of the variation correlates with the number of inks present, but, that's a different question.

Today we are pondering what the statistical distribution of color difference values looks like. From this analysis, it would appear that the variation from the TR 002 data set should be chi-squared with something less than 3 degrees of freedom. The exact number of degrees of freedom depends on the number of inks, and their relative proportion. Note that, while the chi-squared distribution makes intuitive sense only when the degrees of freedom parameter is an integer, it is still defined for non-integer values. Thus, a C40, M40, Y10 patch may exhibit 2.3 degrees of freedom.

This is a similar conclusion to that of Nadal et al. They conjectured that the deviation from chi-squared distribution with three degrees of freedom was due to a violation of criteria #4. They used this to justify the use of the number of degrees of freedom as a regression parameter. They report determining a value of 2.4 for the degrees of freedom in one of their data sets. They did not consider the possibility of a violation of criteria #3, but I showed that this would have led them to the same course of action.

So, failures of criteria #3 and/or #4 can be seen in the data. This means that a chi-squared distribution of degree less than or equal to 3 is appropriate.

Conclusion

There are a number of significant findings in this.

1. I have demonstrated that, at least for one data set, the distribution of the square of color difference (ΔEab) values does not follow a chi-square distribution with three degrees of freedom. I have tracked the cause of the failure down to the fact that the variation in L*a*b* is somewhat less than three dimensional.

2. I have conjectured that the dimensionality of the variation (in printing) is connected to the number of inks that are present in the overprint. The data is there, I have not done the analysis to verify this. Ideally such an analysis would allow one to use the halftone percentages of a given color to predict the value for the degrees of freedom in the chi-square distribution. Alternatively, it could be determined from the data when characterizing the process.

3. For SPC, it is normally the case that the target color is dictated from above. This means that ΔL*, Δa*, and Δb* will usually not have zero mean.

4. I have proposed that for SPC, color variation can be modeled with ΔL*, Δa*, and Δb* being normally distributed.

5. If you combine points 2, 3, and 4, you have the distribution of ΔEab. It is a non-central chi-squared distribution with three or fewer degrees of freedom.

6. I have described a multi-dimensional extension of the standard deviation, which I call ellipsoidification. This can be used to determine the shape of the cloud of variance in any dimensional space. While the immediate application to color is apparent, the method could apply equally well to 30-some dimensional spectra or to the analysis of the variation in the price of a collection of ten stocks.

7. This analysis was carried out with the ΔEab color difference formula. I am going to conjecture that all this will work with ΔE 2000, or any other color difference formula that you choose. After all, the other color color difference formulas are all about ellipses. What could go wrong? Well, except for that pesky ΔE 2000 formula.

This is a work in progress. <Cue Man of La Mancha.> In my vision of SPC for ΔE, the appropriate distribution will be used - perhaps the non-central chi-squared distribution with something less than 3 degrees of freedom. (To dream, the impossible dream...) The amount of non-centrality and number of degrees of freedom will be discerned from the characterization of a process to arrive at the 99.75th percentile ΔE 2000. (To fight the unbeatable foe...) During production, this will be used as the upper control limit, but also - once enough production data has been received - the same techniques will be used to provide additional diagnostics. (This is my quest, to follow that star...)

But I have a lot of work to do before all these pieces fall into place. (No matter how hopeless, how near or how far...)

To Dream the Impossible Dream!

I am I, John the Math Guy,
the guy with the slide rule...

Ok... maybe I got a little melodramatic there. Sorry.

Addendum

I continue in my search of SPCDE.

Bilbiography

[1] Dolezalek, Friedrich, Appraisal of production run fluctuations from color measurements in the image, TAGA 1994

[2] McDowell, David, Statistical distribution of DeltaE, pdf dated Feb 20, 1997

[3] Anonymous, Kodak Q-60 color input targets, Kodak tech paper, June 2003

[4] Viggiano, J A Stephen, Statistical distribution of CIELAB color difference, June 1999

[5] Nadal, Maria, C. Cameron Miller, and Hugh Fairman, Statistical methods for analyzing color difference distributions,

[6] ASTM E2214-02, Standard practice for specifying and verifying the performance of color-measuring instruments

[7] Snedecor, George, and William Cochran, Statistical Methods, 7th Ed., Iowa State Press, 1980, p. 492

Wednesday, November 2, 2016

Statistical process control of color difference data, part 3

By now, I suspect that pretty much everyone has heard about the first two blog posts in this series. Everyone has been talking about them. E! News, Jimmy Kimmel... There is eager anticipation for the next installment.

The first blog post alluded to some strange goings-on when it comes to statistical process control (SPC) of color difference data. But it was mainly an overview of traditional process control.

The second blog post had some actual data, showing that color difference data does not follow the rules of the game for traditional process control. Color difference data is not normally distributed. And if you don't follow the rules... Well. You kinda shouldn't oughta play the game. Or at least be prepared to be yelled at by an angry lemur.

Lemurs are a bit sensitive about mathematical statistics

In this blog post, I give a somewhat intuitive answer to the big question on everyone's mind. In the next post, I will go way off the deep end and explain it with some <egad> math.

Simple explanation

First, I offer a simple observation. While this doesn't qualify as being a full-fledged theoretical dissertation, it does make one pause and say "hmmmm... there is something strange in this neighborhood." When the time comes around, I will cue you to say that.

For a true normal distribution, there is always a chance - perhaps an incredibly tiny chance - that the values could be negative. But color difference data never goes negative. I think we know that, but the implication is rather broad. Color difference data simply cannot be truly normally distributed. (Cue the Ghostbuster's theme: "If there's something strange, in your ΔE data, who you gonna call? John the Math Guy!")

Groucho demonstrates the absurdity of negative ΔE

One could certainly argue that in some cases, color difference data might be indistinguishable from a normal distribution. Take for example the case where you are determining the color difference between a set of white parts and a set of black parts. The color difference might always be more than 50 or 80 ΔE, so the fact that the color differences can't go negative is inconsequential.

But when we are doing process control of color, one would hope that the color measurements of the products are all kinda clustered around the target color. So the lack of negative color difference values is significant.

Another simple explanation

There is a common spec for spectrophotometers called "repeatability". A more accurate name for the spec might be "the degree to which an instrument agrees with itself in the short term on measurements of the same spot." I have been lobbying the standards committees to change the name, but so far, I haven't got much traction. I dunno, there was some wimpy objection about making tech writer's work too hard. I'll keep pushing the issue.

Repeatability for color measurements is determined by taking ten or maybe twenty measurements of the exact same spot. If someone else is doing the measuring, I prefer twenty. These are averaged to provide a reference color. Then the difference is computed between each of the ten or twenty measurements and the reference color. The reported reproducibility is the standard deviation of the ΔE values. This is known as the Mean Color Difference from the Mean (MCDM) method. It is described in the venerable color science book by the venerables Billmeyer and Saltzman.

I have a bit of a reservation that has to do with the utility of the spec. Years ago, back in the dark ages, photons were scarce and light bulbs varied a lot from shot to shot. As a result, reproducibility was a useful measure of the performance of the instrument. But with today's instruments, this number is normally very tiny. The much more meaningful specs are intra-family agreement (how well two instruments in the same family agree with each other) and inter-instrumemt agreement (how well any two instruments agree). I have written about this before.

In the Dark Ages, the Knights of the Round Table
were reduced mainly to scotopic vision

But the big bugaboo that's bugging me is the boogers caused by one of the spectro manufacturers who has chosen to define their own version of repeatability. I mean, first off, there is a standard out there. How about just following it? But the big bugaboo is that they decided to use the standard deviation of the color difference values, rather than the average.

At first blush, this seems like a reasonable way to look at the variation in a bunch of numbers. I mean, the standard deviation is the tool that I was given in Stats 301 to quantify variation.

But, lemme provide an example where this approach fails miserably. I admit that my example is extremely contrived, and is extremely unlikely to happen in the real world, but still, I think it should make you stop and scratch your head and say to yourself "gosh, is this really the kind of behavior that I want out of a spec for reproducibility?"

Here is the long-awaited and extremely contrived example. Let's say we were determining reproducibility and received the following four measurements. Yes, I know... you should take ten or twenty, but this is an extremely contrived example, ok? Here are the four measurements:

{50, -10, 0}, {50, 10, 0}, {50, 0, -10}, {50, 0, 10}

The average of the four is {50, 0, 0}. If we use the 1976 formula for ΔE, we see that each one is exactly 10 ΔE from this average. Remembering that the final step in computing reproducibility is taking the standard deviation... ummm, let's see... 10, 10, 10, 10... the standard deviation is exactly zero! We have a perfect instrument, with no variation whatsoever!! Woo-hoo! Of course, any pair of measurements are 14 to 20 ΔE apart, but the spec says this spectro is giving readings that are perfectly reproducible!!

The standard deviation of ΔE data can be horribly misleading! And I mean "midnight in a graveyard on Halloween" kinda horrible.

I don't know if I mentioned this before, but this is an extremely contrived example. On the other hand, it will hopefully give one pause to consider -- ΔE measurements, and especially the standard deviation thereof, don't follow the normal rules for SPC that we all know and love.

What went wrong?

The problem is illustrated (in general) in the diagram below. If the color values all lie on a circle (or sphere) centered on the reference color value, then the standard deviation of the ΔE values is absolutely worthless as a measure of the dispersion of the color points.

Since with the standard deviation method for determining repeatability, the reference color value is computed from the color values, this contrived situation will only occur if the color values are scattered uniformly on the surface (or near the surface) of that sphere. But if we venture out to situations where the reference color does not come from the data itself...

A less contrived example, in story form

I may have alluded to this before, but that last example was a bit contrived. It would never happen in real life. How about an example that is more realistic? Cuddle up with your favorite blankie and listen to a little story.

Once upon a time, there was a QC person at a print shop who was really into classical SPC, and a press operator at that same print shop who was particularly fond of run-time charts. Below is an example of one of the charts that they ran. The chart shows the adherence to a target color for the yellow solid of thirty samples from a production run. When I say "adherence to target color", I mean the ΔE between the measured color of the solid and the target color. The yellow lines are the upper and lower control limits, and the red line is the customer tolerance. Note that for the sake of simplicity. I am using the 1976 ΔE fomrula, so a customer tolerance of 4.0 ΔE is typical.

(Full disclosure... this is fabricated data. A very good facsimile of real data, but still fabricated.)

The view of the production run for the press operator and QC person

Everything is hunky-dory. All the samples are well within the control limits, so these partners in print conclude that the process is under control. The CpK is 2.44, which is a clear indication that the process is fully capable of providing product that meets the customers requirement. Everyone is happy. The crew manager is bringing in champagne to celebrate, and the boss is talking about big fat bonus checks for everyone. (Did I mention that this is a fairly tale?)

Now, the ink company regularly receives this SPC data from the printer, since they are genuinely interested in seeing the customer succeed. (You can believe that, can't you?) The inkie at the ink company has decided to have a little different look at that same data. (An inkie is a technician at an ink company. Generally affable. Good at Soduko, but not necessarily inclined to play often.) You will note that there is a similarity between this and the picture in the section "What went wrong?"

(Same disclaimer... this data is fabricated. Good facsimile, blah blah blah. But note that the chart above and the chart below are plots of the same fabricated data set.)

View of the production run of the ink QC person

Unbeknownst to the printer, the QC person at the ink company (who only has the best interest of the printer in mind) notices a very clear issue on this press run. Can you see it? All the little bees are lining up to the left of the target color. The pattern (a scattering which is mostly up and down) is typical. It shows that the variation on press is largely caused by a variation in the amount of ink on the substrate. There is an offset caused by a hue shift that is equally large.

Before making any changes, the inkie checks if this is a fluke or a trend. Previous runs all had the same basic problem. The course of action? Henceforth, the inkie decides that henceforth the formulation of the ink will add two drops of beet juice to every kilogram of yellow ink so as to move the color of the ink closer to the target.

Everyone is happy now, right?

Happy? Not quite. The printing plant is in Wisconsin, isn't it?!?!? Happiness is just not part of the culture in Wisconsin.

Every good tale (fairy or otherwise) must have some tension. There must be a plot point where the protagonists engage in conflict. This blog post, by the way, is a good tale, so here comes the conflict. The press operator was never told about the big change to the ink. (Can't you just imagine Jennifer Aniston doing that?) I realize that this never happens in real life, but go along with me here for the moment. The press is fired up and by gosh and by golly, the beet juice hits the fan. Here is the run chart for the first press run with the new ink.

Press operator's view of the press run after the fateful plot point

Put yourself in the place of the press operator. He had already hinted to his wife about the possibility of an all-expense paid trip to Hawaii if things continued going so well. And there is this press run with virtually all of the samples outside of the two yellow lines! The customer's spec is still being met, but jeepers creepers! Someone's gonna get ticked off.

The QC person sees a different problem. The CpK for this press run is 0.57. That's not good, by the way. One would like that number to be larger than maybe 1.3.

Naturally, they decided to blame the ink manufacturer. This is, as we all know, item #2 on the standard operating procedure in almost every printing plant I have ever been in. They called up to complain. The inkie agreed that, yes, a change was made to the formulation. After being thoroughly beaten about the ears for not telling anyone, he pulled the data from the new product run into his spreadsheet.

The plot below came up. The line of bees had gone from being 2 or 2.5 a* values to the left to being about 0.5 an a* unit to the left. Perhaps a bit of overshoot, or perhaps that's just normal variation. Certainly a welcome improvement. The press operator and QC person thanked the inkie, and agreed to continue with the new formula.

Inkie's view of the production run of the new ink

The press operator and QC person were both a bit flummoxed, but had to admit that the inkie was right. The change seemed to improve things.

Finally, the two badgered enough people that they found someone who knew a bit about color science and could counsel them on what went "wrong". The color science person said,

"What?? You should complain if your color error gets better than it used to be??!?! My advice... Ignore the lower control limit for ΔE. ignore the lower limit in the run time chart, or better yet, tell your stinking software not to plot it when you are dealing with color differences. And when you do the CpK thing... same thing. The formula uses both the upper and lower limits, and takes the minimum. Just do the computation based on the upper limit - the one-sided CpK."

That was what the first color scientist (the one with the beard) said. A second color scientist (who naturally also has a beard) pointed out a use for the lower control limit. Brian said that while the lower control limit may not be a trigger point for deciding when to get all up in arms about the process being broken, it is an indication that something changed for the better, so it might not be a bad idea to poke around and figger out what went right. You might want to continue doing whatever that was!

So everyone went home that evening feeling that the crisis had been averted. And we have two rules that will keep us from making this analysis mistake in the future. (Take note of this. I guarantee these will be on the final exam.) Do not use the lower control limit when doing SPC on ΔE color difference data. Modify the CpK computation so that it only considers the upper control limit.

But...

Even with the satisfying conflict resolution in the plot, the QC person didn't sleep well that night. Finally, at 3:00 AM, he/she got out of bed and sat down at his/her laptop to look at the data. After a lot of futzing around in Excel, he/she came to the realization that the variation in ΔE was much lower before the change was made to the ink. The original data had a standard deviation of 0.24 ΔE. The standard deviation of the color difference of the new data was 0.64 ΔE. This is a problem! Why did the variation in the process jump up like that?

Lemme see... what was item #2 in the SOP? Obviously, the new ink formulation was to blame. So, the inkie got a call at 8:01 AM. "I don't think you stirred the ink enough after you added those two drops of beet juice!" Imagine this said by a person who got like 17 minutes of sleep the night before.

To make a long story just a tad longer, the color scientist kinda person - the first one, you know, the one with the beard - got consulted again. He stroked his beard and then made the following drawing on his chalkboard. (Note that all color scientists have beards, if they're male, anyway. If not, they like Thai food. All color scientists, male or female, have a chalkboard instead of a whiteboard.)

The bearded color scientist demonstrates an uncanny ability to
connect color science with football plays

So, to paraphrase what the bearded color scientist said: standard deviation of color difference values does not work like we expect.

Personally, I can think of exactly one application of the use of standard deviation of ΔE, and that's an example of why no sane person would ever use of standard deviation of ΔE.

I may as well mention...

I hate to go on about my disdain for the use of standard deviation on ΔE values, but I have one more, slightly related thing to get off my chest. Even if this value were reliable, its use can be misleading. SPC is all about using "three sigma" to set control limits. That works for normally distributed data, but as we saw in the previous blog post, ΔE does not follow a normal disturb-ution

Three rules for SPC of color difference data

1. The standard deviation of ΔE is an unreliable measure of the distribution.

2. Ignore the lower control limit when doing SPC on ΔE color difference data.

3. Friends don't let friends compute the standard deviation of ΔE.

4. Modify the CpK computation so that it only considers the upper control limit.

5. Computation of the standard deviation of ΔE? Come on. Just say no.

6. Always proofread for number mismatches.

Foreshadowing the next blog post

The saga continues...

The last blog post was all about color difference data not following a normal distribution. I completely ignored (almost) any of that discussion in this blog post. I could have made all kinds of comments about how, since color difference data is not normally distributed, the traditional computation of upper control limit is no longer applicable, and similarly, the whole CpK thing must be rethought.

I ignored the discussion of normality, since the issues I point out today are fundamental, and not directly related to the statistical distribution.

In the next blog post, I want to address the question, "if the data is not normally distributed, then what is the distribution?" At least that's what I want to write about. Who knows what I actually will write about. I am certainly not smart enough to predict my actions!

Moving on to the final post in this series

This blog post was updated on Nov 7, 2016, thanks to Dave Wyble's comments about two bonehead mistakes I made in the original version. First (I can't believe I actually did this) I used the term "reproducibility" instead of "repeatability". I should have known better. I apologize to all who were hurt, psychologically or otherwise, by my blatant misuse of the English language.

Second, I misquoted the definition of repeatabilty that is given in the standards. And I shoulda known better. I just got it confused with the silly definition used by one of the spectro manufacturers.