Wednesday, November 9, 2016

Statistical process control of color difference data, part 4

Warning: If you were considering whether to jump off the diving board into this blog post because this series is getting too deep mathematically, then I would suggest for you to get off at this stop and wait the next bus. On the other hand, if you were one of the hardcore color geeks who has been chafing at the bit for me to get to the meat of the matter, then read on, read on. If you get all the way to the end of this blog post and find yourself saying "yeah... that all makes sense", or even "John is completely wrong on this!" then I commend you. I hope to get a chance to have a beer with you.

There have been a number of comments on the previous blog posts (primer on SPC, deltaE is not normally distributed, and anomalies with standard deviation) from hardcore color geeks. All of these comments are from smart color scientists. Please note that smart color scientists either have beards, or are named Dave. Unless, of course, they are female, in which case they like Thai food.

David MacAdam, Albert Munsell, and Deane Judd
All sporting beards!

Dave Wyble suggested looking at the variance on the individual components, ΔL*, Δa*, and Δb*. "Conventional wisdom says [component differences] will be normally distributed..." Come on, Dave. When was wisdom considered "conventional?" 

(Color scientist of the world today trivia fact number seventeen: Dave does not have a beard at this time. He has, however, had one at various times in his life.)

Max Derhak anticipated this blog post in one of his comments: "Isn't this a Chi-squared distribution?" 

Steve Viggiano commented: "Having advocated Hotelling's t-squared for this application for decades, I am interested in where this is going." He also commented on the distinction between deltaE and deltaE squared, and the unrealistic assumptions baked into the chi-squared distribution. I will address these directly.

These folks are all hardcore color geeks. Max and Steve both have beards, by the way. Come to think of it, I also have a beard. Therefor I must be a smart color scientist... a smart color scientist who is not so good with syllogisms.

The question on the chalkboard today is this:

Do color difference values, as measured by the square of the ΔEab values, have a chi-squared distribution? If not, then what is it?

Literature review 

Several sources have suggested that ΔE has a chi-squared distribution. The earliest I have found is from Fred Dolezelak [1]. He looked at measurements from 19 print runs and came to the following conclusion:

"[The ΔE values] followed a chi-squared statistic, characterisable by a single parameter, which could be linked to the standard deviations in L*, a*, and b* space."

In the Appendix of his paper, he referred to a previous paper, by H. G. Volz, which demonstrated this. I took a cursory look for the paper, and didn't find it. I admit to not trying real hard. It's in German. Unlike most color scientists, I don't read German.

Dolezalek's paper included the following graph, supporting his claim. Note that this is technically known as a funky-graph because of the nonlinear scaling on the y axis. This crazy scaling is designed so that data from a chi-squared distribution will plot out as a straight line.
One set of data, from Dolezalek's paper

All in all, this is not a bad fit to the data. But note that it can be seen to fall off at the right end. The four points at the end (the four points with highest ΔE) are all below the line. This is a sign that perhaps the curve fit is not ideal.  The fact that the curve fit starts falling off at the 80th percentile is bad juju for SPC. For SPC, the upper control limit is conventionally set to the 99.75 percentile. 

I will amend Dolezalek's statement accordingly: "[The ΔE values] followed a chi-squared statistic, but not in the area where SPC needs!"

The next reference I found is from an unpublished document by a highly respected friend of mine, Dave McDowell [2].  Kodak produced a test target which may be familiar to my readers - around 700,000 of these were manufactured. Kodak wanted these to be produced to tight tolerances, so they were very rigorous about their process control. They found the chi-squared statistic suitable for their needs.

Kodak QC60 target

Here is what Dave said:

"The quantity 'deltaE/2 - avg' when squared follows the chi-squared distribution...

"Evaluation of a large number of samples of the Kodak Q60 transmission and reflection targets showed that the deltaE characteristics of individual samples compared to the batch mean followed this same statistic." 

Before I go on, I want to highlight the "when squared" part that Dolezalek had inadvertently missed. The metric that theoretically has chi-squared distribution is not ΔE, but rather ΔE squared. I am sure that Dolezalek was aware of the fact, since his results were at least reasonable. It was probably an unintentional omission on his part. Steve Viggiano reiterated this in a comment to part 2 of this series.

Unfortunately, McDowell did not provide any data or plots by which we can assess the strength of his statement. He has provided me with the data, however. Winter is coming in Wisconsin. Plenty of time for me to curl up with my laptop and savor the 150 files. (It goes without saying that Dave does not have a beard.)

(The text that was quoted above later appeared in a white paper from Kodak [3]. This makes sense, since McDowell worked for Kodak at the time.)

Steve Viggiano is another highly respected friend of mine who has weighed in on this topic, again in an unpublished document [4]. Viggiano was the first (to my knowledge) to articulate the precise conditions that must be met in order for the chi-squared function to be applicable for ΔE. I don't mean to say that Viggiano invented these criteria - they go with whoever invented the chi-squared function. I mean to say that Viggiano was the first to assert these criteria for the distribution of color difference values. Following Stigler's law of eponymy, I will refer to these as Viggiano's criteria.

More on these criteria in the next section. A lot more. I might add that this blog post would not have been possible without Steve's prolific pursuit of pedantic pleasures. He knows this stuff better than I ever will.

Moving along to additional chi-squared sightings, the ASTM standard E 2214-02 [6] has a brief mention of the distribution. It states the following: "As observed in Fig. 2, the mode, median, and mean of a set of color difference (ΔE) determinations do not follow a bell curve but a curve related to the Chi-squared or F statistical distributions..." There is unfortunately no further explanation.What is the relationship? When is F applicable?

I know of one additional reference that discusses the statistical distribution of ΔE values, a paper by Maria Nadal et al. [5]. In their paper, they compared various methods for determining the 95th percentile of color difference data. 

Why the 95th percentile? Two of the authors of this paper are from NIST. For a fee, NIST makes official color measurements, usually of colorimetrically stable objects such as Lucideon tiles (previously known as Ceram tiles. which were enshrined in the literature under the name BCRA tiles). As part of their service, they assign confidence intervals to the measurements. They are required to report the 95% confidence interval.

This is very useful analysis and a well thought out paper, since there is a dearth of technical information on real-world evaluation of the distribution of color difference data. But, the analysis in Nadal et al. is not necessarily directly applicable for our goal, however. For SPC, one needs the 99th, or preferably, the 99.75th percentile. Our problem is more difficult, since we are interested in the shape of the distribution way out in the tail.

Putting our quest in perspective

What is the chi-squared distribution?

Let's say that you take a bunch of random data - values sampled from a distribution - and then add them in quadrature. (This is a fancy phrase meaning that you combine them with the Pythagorean theorem, which is to say, you square each one, add up the squares, and then take the square root of the sum.) The result is a chi-squared distribution, provided the Viggiano criteria have been met. (Yes, I will get to those criteria. Don't rush me!)

The chi-squared distribution is not just a single distribution; it is a family of distributions. The members of the family are distinguished by the number of random variables that were added together to get that distribution. We refer to any of the family as a chi-squared distribution with n degrees of freedom, where n is the number of random variables that were summed in quadrature.

Once the number of degrees of freedom has been decided upon, there is only one parameter left. This parameter accounts for a scaling of the distribution along the ΔE axis, and is dependent on the standard deviation of the distribution from which the random variables have been taken.

Note that the ΔEab formula is equal to the sum in quadrature of the differences in each of the colorimetric components: ΔL*, Δa*, and Δb*. So, it is at least a potential candidate for the chi-squared function with 3 degrees of freedom.

For the distribution of ΔEab squared to be chi-squared, the distributions of the three components must satisfy all four of the following criteria (as recited by Viggiano):

    1. They must all have zero mean. 

    2. They must all be normally distributed.

    3. They must all have the same standard deviation. (ΔL* can't dominate, for example.)

    4. They must be independent. 

Important plot point: If we can ascertain that ΔE squared follows the chi-squared distribution then finding the 99.75 percentile would be a simple matter of arithmetic applied to the mean.

Does the TR 002 data fit the chi-squared distribution?

I tested the TR 002 data to see if the chi-squared distribution thing worked. The data is measurements of 928 different CMYK patches as printed on 102 newspaper presses throughout the USA. (The data set is further explained in a previous post in this series.)

I first computed the average L*a*b* values for each of the 928 patches. Each of these averages was thus representative of what would be printed on the 102 different newspaper presses. For each of the 928 patches, I then computed the color difference, in ΔEab, between this average and each of the 102 measurements. This gave me a collection of 102 X 928 color difference values. More than enough to wallpaper my kitchen.

A comment here: My dog asked me why I didn't use ΔE 2000, rather than ΔEab. He felt that it would be more useful to use the latest and greatest color difference formula. Presumably this is the formula that will get the most air time in the future. I agreed with him (he is pretty smart, as dogs go), but explained that, for purposes of the initial investigation in this blog post, I prefer to use ΔEab. If all of the Viggiano criteria hold true, then the square of ΔEab would precisely fit the chi-squared distribution with three degrees of freedom. If this part holds true, then the next step would be to see if ΔE 2000 was reasonably close. (A bit of foreshadowing: My conclusion, as we shall see, is that even ΔEab did not fit this model.)

I generated cumulative probability density functions (CPDFs) for each of the 928 patches. I scaled each of the CPDFs by their mean. In this way, they all had a mean of 1.0 ΔEab. Bear in mind that if the variations truly follow a chi-squared distribution, then all of these would have the same shape, but just different scaling in the x axis. So assuming that all four criteria are met, these color difference values should all be from the same distribution. Therefor I can combine them into one CPDF. This gives me the advantage of having 928 X 102  = 94,656 data points, so the resulting CPDF will be relatively smooth.

To check the assumption of chi-squaredness,, I used a Monte Carlo method to generate the CPDF of the hypothetical distribution if all the Viggiano criteria hold. I generated 94,656 sets of hypothetical variations (in ΔL*, Δa*, and Δb*), each of which drawn from a normal distribution with mean of 0 and standard deviation of 1.0. I then went through the ΔEab formula to generate hypothetical color differences. Finally, this collection of values was normalized to have a mean of 1.0 so as to best match the distribution computed from the real TR 002 data.

Generating a plot of a CPDF from a collection of values is a rather easy task, by the way. (Thank you for asking how it is done.) First, the data is sorted from smallest to largest. Next, a second array of the same length is generated with values incrementing in fractional steps from 0 to 1. This incremental array is then plotted as a function of the color difference array. Note that this is kinda the reverse of the way we would normally plot.

The plot below shows a comparison of the two CPDFs. The blue line is the actual data, and the red line is the chi-squared distribution with three degrees of freedom. Both have a mean of 1.0. Gosh. They look different.

Do real color difference values follow a chi-squared distribution?

The above plots can be differentiated to create a probability density function, as shown below. As expected, they are not as smooth as the CPDF curves, but it is still clear that the two distributions are dissimilar. Real color difference data us more skewed to the left and has a longer tail to the right than the distribution based on the chi-squared function with three degrees of freedom. I put that in italics, since it kinda sounded like an important conclusion.

Clearly the distribution of real color difference values does not follow a chi-squared function with three degrees of freedom. What went wrong? One or more of the Viggiano criteria must have been missed. Let's look at the criteria one at a time.

Zero mean

Do the differences ΔL*, Δa*, and Δb* all have zero mean?  In general, this depends on how a given data set is compiled. I see three general cases:

In the first case, the data itself is used to define the target color. In a Mean Color Difference from Mean (MCDM) scenario, the target L* value is the average of the L* values of the data set, and similarly for a* and b*. In this case, the ΔL*, Δa*, and Δb* have zero mean by design. This is the official formula for determining the repeatability of a spectrophotometer. This was also the case in the analysis of the TR 002 data set that was presented in the second post of this series, and in the previous section. 

A second case is described in Nadal, et al. They talk about looking at all pairwise color differences in the data set. For example, in a set with ten color values, the color difference would be determined between the first sample and each of the nine other samples, between the second sample and each of the eight others, and so on. For ten color values, there are thus 45 different unique combinations.

It is not immediately obvious, but this case also has zero mean for ΔL*, Δa*, and Δb*. Consider the case where all pairings are considered, not just unique pairings. That is to say, one computes ΔE (sample n, sample m) as well ΔE (sample m, sample n). Noting that ΔL* (sample n, sample m) = - ΔL (sample m, sample n), it is easily seen that for every pairing, there is the reverse pairing which balances it out. Thus, if one considers all non-unique pairings, the three colorimetric components all have zero mean.

Since ΔEab is one of the commutative color difference formulas, the color difference for a pairing is the same as that of the reverse pairing. Therefor, the distribution of ΔEab is the same for the case of all pairings and all non-unique pairings. I would argue then that the first criterion is met with the case presented by Nadal, et al.

Both of these are (somewhat) unnatural cases, which is to say, not generally encountered in the SPC of color. Generally speaking, the target color has been externally specified, typically by the customer of the product. Although the process has been adjusted to come close to this color on average, this will never be exactly the case. It is not atypical for the average color of a print run to be a few ΔEab from the target color. Therefor, at least in the SPC of color in printing, Viggiano's first criteria is rarely met. Remember that the whole gist of this series is SPC? 

With SPC, we would therefor expect that the violation of the "non-zero mean" criteria will have a major effect on the distribution of color difference values. Viggiano anticipated that I would write that in my blog one day, so he mentioned that if the zero mean criteria is violated, then color difference will follow a non-central chi-squared distribution. 

Thus, when SPC is performed on color difference data, the non-central chi-squared distribution is the appropriate choice (assuming any of the chi-squared distributions are appropriate). Once again, I used italics cuz this sounded important.

Normal distribution of components

Are the distribution of ΔL*, Δa*, and Δb* normal? Gosh. This is not a simple question. On the one hand, it seems like a reasonable assumption, At least as a starting point. 

On the other hand, the values L*, a*, and b* are computed through a nonlinear transform of X, Y, and Z, so strictly speaking, either XYZ or L*a*b* could be normal, but never both. But then again, the nonlinearity is small when compared to the typical variation, so the effect is  probably not of practical importance. (This statement relies on a simple rule - every smooth function looks like a line of you look at a small enough piece of it.)

But, on the third hand, there are certainly conditions where the variation of L*a*b* is distinctly not normal. For example, if the process has a pronounced drift, the variation could resemble a uniform distribution. If the process has a sinusoidal fluctuation then the distribution will be a U shaped distribution with asymptotes corresponding to the two extremes of travel. I once figgered out what the formula for that distribution is. I fergit just now what my answer was.

I will argue that, with good SPC, care is taken so that the process is in known, good working condition.before initial characterization. This, of course is not always the case. The idealist side of my brain would put my foot down and say "WHAT!!?!?!? Gosh darn it to heck! If the process has an oscillation or drift, then in the name of peanut butter and jelly sandwiches you better fix it before you do any characterizing!!" But the realist side of my brain is willing to admit the possibility that some (if not all) processes have drift and/or oscillation that cannot be eliminated.

I generally avoid having the idealist and the realist sides of my brain in the same room together. It avoids a lot of arguments. But in this particular case, the tiny tiny portion of my brain that has some modicum of mediation skills was able to come to a compromise that was suitable for both of the other sides of my brain. If the process is capable (in the SPC sense of the word) of providing product that is within the customer tolerance, then does it matter if there is oscillation or drift? If it costs money in the long run to get rid of that anomaly, then one has to weight this against another customer requirement - price.

Getting back to the matter at hand, I am going to start with the bold assertion that when the process is in good working condition, the variation will likely be close to normal. But I will follow up with a little investigation to back this up.

I tested the TR 002 data to see if the variations in the colorimetric components had a normal distribution. My testing amounted to analysis of skewness and kurtosis.

-- Skewness

The skewness was computed for each of the 928 patches and each of the 3 colorimetric components. This gave 2,784 values for skew. Each value of skew represented the distribution of a different set of 102 samples.

The graph below shows all the values of skew.

Skew of ΔEab values

What to make of this? There are various rules of thumb about how much skew is of practical importance. Perhaps a skewness with absolute value greater than 1 or 2 is considered to be significant? By that rule of thumb, skew is not a problem.

A more precise test comes from a table in my favorite statistics book [7]. According to the table, if I were pulling 100 samples from a perfectly normal distribution, I would expect that 2% of the time, I would have a skewness above 0.567 or below -0.567. In this data set, I saw that 16.6% of the skewness values were above this number.

So, I can say with statistical certainty that an inordinately large number of the data sets are skewed. Oddly, about two thirds of these are skewed positive and one third is skewed negative. I don't know why that is! Why would the variation of some colorimetric components of some of the patches go positive and others negative? Without careful examination of the data, I would guess that there are some outliers? I might even go so far as to guess (rather oxymoronically) that there are an appreciable number of outliers. Someday when I run out of beer in the fridge, and run out of reruns of Get Smart to watch, I will look at that.

Then the realist side of my brain kicks in. "Really? Did you happen to notice that there are only seven cases where the skew is greater than 1.5??!? What are you smokin'? That's not a lot of skew!!!"


Kurtosis is the other popular measure of a data set that can be compared to that of the normal distribution to test for normality.

I compared the kurtosis values against the 1% ranges for kurtosis (also from [7]) , and found a similar situation. 6% of the data sets showed a statistically significant deviation from the normal distribution, with some being leptokurtic and some be platykurtic. (Man! I love those words!!)

Without a whole lot of guidance, I am going to jump to the wild conclusion that the kurtosis is not of practical significance.

--Conclusion of normality analysis

So, my conclusion is that there is a statistically significant number of color difference measurements that have skew and/or kurtosis. Some go one way, and some the other. But from a practical standpoint, the skew and kurtosis are not appreciable. It is unlikely that this Viggiano criteria that has been violated.

Equal standard deviation and independent

In the beginning of this blog post, I stated that I would take the Viggiano criteria one at a time. A lot has happened since then, and I have changed my mind. I have lumped these two criteria together, since they are symptoms of the same illness. If the illness is present, it will manifest in one or the other of the criteria, or perhaps both. I believe this analysis is unique to the color science world, but perhaps not among chi-squared-ologists.

To help understand this, consider the two plots below. For the plot on the left, I generated pairs of x and y values from a random number generator that gave mean of 3.0. For the x values, the standard deviation was 0.1, and for the y values, the standard deviation was 0.7. The plot on the right is that same data, but rotated about {3, 3} by 45 degrees.  

The data set on the left violates Viggiano criteria #3, since the ratio of the standard deviation in y is 7 times that of x. On the other hand, there is no correlation between the two axes. In other words, it passes criteria #4, but fails criteria #3.

In the data set on the right, the standard deviations of the x and of the y values are practically the same, so criteria #3 has passed. On the other hand, the correlation coefficient between the x and y values is 0.958. With such a strong correlation, the data on the right clearly fails at criteria #4.

From the standpoint of the problem at hand, the two data sets are identical, so the conclusions should be identical. In the one case a reasonable test is to look at the ratios of the standard deviations. In the other case, the correlation coefficient seems like a likely test. This begs the question about how to equivalently evaluate the two criteria, and how to catch the situation where the "illness" has been apportioned to both of the criteria. 

-- Interlude into ellipsoidification

I have developed a technique that is appropriate here. I mentioned the technique in a blog post about the color red back in January of 2013. I dubbed it ellipsoidification.  I have not seen this described elsewhere. Certainly the word is novel, if not the technique.

Ellipsoidification is an extension of the standard deviation to multiple dimensions. 

[Note to those who have committed ASTM 2214-02 [6] to memory. Section 6.1.1 describes an extension to standard deviation to multiple dimensions. While the method 2214 is similar - perhaps even very similar - it just misses a truly beautiful result.]

Ellipsoidification produces a vector with one standard deviation value for each dimension of the data. In the case of uncorrelated coordinates (as in the example above on the left), the multi-dimensional standard deviations are equivalent to the standard deviations of each of the individual dimensions (i.e. stdev (x), stdev (y), etc.). 

For data that is correlated in one or more dimensions, the results are the same as if you first rotated the coordinate system so as to make all the axes independent, and then took the one dimensional standard deviations of each component individually. In the example above, the plot on the right would first be rotated by 45 degrees, and the the standard deviation would be computed on each of the axes. Thus, the two plots would have identical multi-dimensional standard deviations, as we would hope.

That was one explanation of the technique.

Here is another explanation of the multi-dimensional standard deviation. Say we have a set of one-dimensional data. It is a set of points scattered along a line. The standard deviation is a line whose length is somehow representative of the amount of spread of the data points. In two dimensions, we do that same trick, but in this case, we are finding the ellipse which best describes the spread of data in two dimensions. In a sense, we "fit" an ellipse to the data. 

In the case of the example above at the left, the ellipse has the major axis straight up and down and the minor axis is right to left. The major axis has length of 0.1 and the major axis has length of 0.7. In the example at the right, the ellipse has the same major and minor axes, but it has been tilted by 45 degrees.

Below I have an actual an example of an ellipsoidification caught smiling for the camera. This is a two dimensional one, since my three dimensional display is at the cleaners getting its nails done. All those little black points that look like data points are data points, generated by a random number generator with normal distribution.

The red ellipses are the one, two, and three sigma ellipses generated from that random data set. I should point out that, although we are dealing with a normal distribution, the one, two, and three sigma probabilities that we memorized for the third grade stats class (68%, 95.4%, 99.75%) no longer apply. In the case below, we are looking at a two-dimensional normal distribution. 

I think this is a pretty cool plot

If we take that to three dimensions, as for L*a*b* data, we are fitting an ellipsoid to the scattering of the data points, Hence the term "ellipsoidification". The multi-dimensional standard deviations are the lengths of the three axes of the ellipsoid. Note that a by-product of ellipsoidification is the orientation of the ellipses, which could be useful. Note that I used this to properly tilt the ellipses above. On production data, the orientation of the ellipsoid can help point to the major cause of the variation.

That was the second explanation.

The first explanation was for the logophiles, that is, mainly in terms of words. The second explanation was for the pictophiles, that is, mainly in terms of a visualization in one awesome graph. The third explanation is for the folks who might actually have to do something with this mess. It is the most useful, because it is algorithmic. It is also, perhaps, the most difficult to understand. At least for me. I dunno, Viggiano probably stared at that plot above for a few seconds and figgered out the algorithm. 

(Color scientist of the world today trivia fact number twelve: Steve Viggiano and John the Math Guy went to different high schools together.)

Here's the third explanation, very terse and incredibly dense: Ellipsoidification is by a complex method involving rotation of coordinate systems to achieve uncorrelated data. 

I think this is a novel finding. 

-- Now back to our data analysis, which is already in progress...

I applied ellipsoidification to the data in the TR 002 data set. For each of the 928 patches I got three values -- the lengths of the three axes, or in different words, the standard deviations in each of three directions.

The real question for us (remember back to criteria 3 and 4?) is not the magnitude of the axes of the ellipsoid, but rather the relative magnitudes of the three axes. If they are all close to the same size, then the appropriate number of degrees of freedom for the chi-squared function is three. If two are the same size and one of them is zero, the there are two degrees of freedom. But how big a difference in magnitude could be considered large enough to lose a degree of freedom?

As an example, let's say that largest axis of the ellipsoid has a length of 1, and another axis has a magnitude of 0.333. By using the Pythagorean theorem on this example, the effect of that second dimension is only a 5% increase in ΔEab. Let's make a rule that this is considered negligible. By this somewhat arbitrary rule, if the shortest axis is less than a third of  the length of the longest, then a three dimensional variation is effectively a two dimensional variation. If two of the axes are less than a third of the length of the longest, then the variation is effectively one dimensional, that is, along a line.

This gives us a method by which to quantify the dimensionality of the variation in the TR 002 data set. I determined by this rule that 

     46.6% of the patches have three dimensional variation,
     40.2% of the patches have two dimensional variation, and 
       6.4% of the patches have one dimensional variation.

Wow. I think this makes sense. I would guess that the variation in the color of a patch is largely due variation in the amount of ink transferred to the substrate, or variation in the dot gain of the ink. If a patch has predominantly only one ink, then I would expect that the variation is largely in the direction of the trajectory of that ink, that is to say, the variation is largely one dimensional. Similarly, two inks would have two dimensional variation, and three or more would have three dimensional variation. It would be an interesting research project to look at individual patches and see if the dimensionality of the variation correlates with the number of inks present, but, that's a different question. 

Today we are pondering what the statistical distribution of color difference values looks like. From this analysis, it would appear that the variation from the TR 002 data set should be chi-squared with something less than 3 degrees of freedom. The exact number of degrees of freedom depends on the number of inks, and their relative proportion. Note that, while the chi-squared distribution makes intuitive sense only when the degrees of freedom parameter is an integer, it is still defined for non-integer values. Thus, a C40, M40, Y10 patch may exhibit 2.3 degrees of freedom. 

This is a similar conclusion to that of Nadal et al. They conjectured that the deviation from chi-squared distribution with three degrees of freedom was due to a violation of criteria #4. They used this to justify the use of the number of degrees of freedom as a regression parameter. They report determining a value of 2.4 for the degrees of freedom in one of their data sets. They did not consider the possibility of a violation of criteria #3, but I showed that this would have led them to the same course of action.

So, failures of criteria #3 and/or #4 can be seen in the data. This means that a chi-squared distribution of degree less than or equal to 3 is appropriate.


There are a number of significant findings in this. 

1. I have demonstrated that, at least for one data set, the distribution of the square of color difference (ΔEab) values does not follow a chi-square distribution with three degrees of freedom. I have tracked the cause of the failure down to the fact that the variation in L*a*b* is somewhat less than three dimensional. 

2. I have conjectured that the dimensionality of the variation (in printing) is connected to the number of inks that are present in the overprint. The data is there, I have not done the analysis to verify this. Ideally such an analysis would allow one to use the halftone percentages of a given color to predict the value for the degrees of freedom in the chi-square distribution. Alternatively, it could be determined from the data when characterizing the process.

3. For SPC, it is normally the case that the target color is dictated from above. This means that ΔL*, Δa*, and Δb* will usually not have zero mean.

4. I have proposed that for SPC, color variation can be modeled with ΔL*, Δa*, and Δb* being normally distributed.

5. If you combine points 2, 3, and 4, you have the distribution of ΔEab. It is a non-central chi-squared distribution with three or fewer degrees of freedom.

6. I have described a multi-dimensional extension of the standard deviation, which I call ellipsoidification. This can be used to determine the shape of the cloud of variance in any dimensional space. While the immediate application to color is apparent, the method could apply equally well to 30-some dimensional spectra or to the analysis of the variation in the price of a collection of ten stocks.

7. This analysis was carried out with the ΔEab color difference formula. I am going to conjecture that all this will work with ΔE 2000, or any other color difference formula that you choose. After all, the other color color difference formulas are all about ellipses. What could go wrong? Well, except for that pesky ΔE 2000 formula.

This is a work in progress. <Cue Man of La Mancha.> In my vision of SPC for ΔE, the appropriate distribution will be used - perhaps the non-central chi-squared distribution with something less than 3 degrees of freedom. (To dream, the impossible dream...) The amount of non-centrality and number of degrees of freedom will be discerned from the characterization of a process to arrive at the 99.75th percentile ΔE 2000. (To fight the unbeatable foe...) During production, this will be used as the upper control limit, but also - once enough production data has been received - the same techniques will be used to provide additional diagnostics. (This is my quest, to follow that star...)

But I have a lot of work to do before all these pieces fall into place. (No matter how hopeless, how near or how far...)

To Dream the Impossible Dream!

I am I, John the Math Guy,
the guy with the slide rule...

Ok... maybe I got a little melodramatic there. Sorry.


I continue in my search of SPCDE.


[1] Dolezalek, Friedrich, Appraisal of production run fluctuations from color measurements in the image, TAGA 1994

[2] McDowell, David, Statistical distribution of DeltaE, pdf dated Feb 20, 1997

[3] Anonymous, Kodak Q-60 color input targets, Kodak tech paper, June 2003

[4] Viggiano, J A Stephen, Statistical distribution of CIELAB color difference, June 1999

[5] Nadal, Maria, C. Cameron Miller, and Hugh Fairman, Statistical methods for analyzing color difference distributions

[6] ASTM E2214-02, Standard practice for specifying and verifying the performance of color-measuring instruments

[7] Snedecor, George, and William Cochran, Statistical Methods, 7th Ed., Iowa State Press, 1980, p. 492


  1. Okay, when do we get to Hotelling's T-squared?

    "Are we there yet?" "No."
    "Are we there yet?" "No."
    "Are we there yet?" "No."
    "Are we there yet?" "No."

    1. "Mommy, Steve's looking at me!"
      "I hafta go potty."
      "I'm bored."

      As soon as I get to the point where I think I can convince people that I know what I am talking about!!

  2. I think it's more like, "Steve's making faces at me!"

  3. --- I am posting this for Danny Rich ---

    A long and very informative discussion to show two results, Criteria 3 and Criteria 4 are not (valid or TRUE or present). But we could have saved some time by stopping and thinking about these two criteria first.

    Criteria 3, which you have shown to not be true, says that the variances of the three coordinates must be the same. But you have chosen CIELAB coordinates as the axes but we do not measure CIELAB L*, a*, b* we measure X,Y,Z tristimulus values (weighted linear combination of reflectance factors) and convert them into CIELAB coordinates. As demonstrated by Macadam, Brown and Wyszecki, CIE XYZxy are not distributed independently because they use the some of the same spectral values. Similarly, L* uses Y, a* uses Y and b* uses Y. Then for a* and b* we add X and Z respectively. Thus the variance of L* is based only on the variance of Y, but the variance of a* is based on the variance of X and Y and the variance of b* is based on the variance of Y and Z. So now - Criteria 4 is shown to not be true and so the variance of Y appears in all three coordinates and unless there is a very exceptional set of conditions involving covariances, the variance of a* can never be equal to that of L* and the variance of b* can never be equal to that of L* and, in general, the variance of a* will never be equal to the variance of b*. Finally, if there is a relationship between the variances of the three coordinates, and they are thus not independent but linearly (at least to a first approximation) dependent then estimating the variance of the distribution will not require the full number of degrees of freedom and the conclusions of this essay and that of Nadal and Fairman follow logically.

    Taking a further step back, one finds that the distribution of two random variables combined in quadrature (square root of the sum of the squares) is known as the Rayleigh distribution and the same expression in three coordinates is known as the Maxwell distribution and both come from statistical thermodynamics (speed of particles due to an infusion of energy in the form of heat). But outside of the area of dynamics, these distributions are not widely known and used. So there are not nice tables of probability for some standard values of alpha.

    So one really needs to continue to probe the process of characterizing a set of color differences with 3 dependent coordinates. Based on John's discovery that it is DeltaE-squared that is approximately distributed as a chi-squared what can we do? Steve Viggiano wants to pull out the Hotellings T² statistic. But that is the distribution of the quadratic form for a line element in Principal Components and if one uses DL*, Da*, Db* as the 3 coordinates then Hotelling's statistics is proportional to g11 x (DL* x DL*) + g22 x (Da* x Da*) + g33 x (Db* x Db*) + 2 x g12 x (DL* x Da*) + 2 x g13 x (DL* x Db*) + 2 x g23 x (Da* x Db*), where the "gij" coefficients are the parameters from the inverse of John's Variance matrix.

    It is said that this line element is distributed as a Wishart distribution which can be approximated with a chi-squared distribution. So we are now full circle - back to where we started.

    But the chi-square distribution, which is only tabulated for integer values of the degrees of freedom is one member of a class of distributions based on the gamma function. This function is of the form of X exp(-X) and the distributions for Rayleigh (R exp(-R) and Maxwell (Z exp(-Z). So perhaps a more general or fundamental distribution (gamma) should be investigated.