## Wednesday, November 14, 2012

### Assessing color difference data

The punch line

For those who are too impatient to wade through the convoluted perambulations of a slightly senile math guy, and for those who already understand the setting of this problem, I will cut to the punch line. When looking at a collection of color difference data (ΔE values), it makes no difference whether you look at the median, the 68th, 90th, 95th, or 99th percentile. You can do a pretty darn good job of describing the statistical distribution with just one number. The maximum color difference, on the other hand, is in a class by itself.

Cool looking function that has something to do with this blog post, copied,
without so much as even dropping an email, from my friend Steve Viggiano

In the next section of this techno-blog, I explain what that all means to those not familiar with statistical analysis of color data. I give permission to those who are print and color savvy to skip to the final section. In this aforementioned final section, I describe an experiment that provides rather compelling evidence for the punch line that I started out with.

Hypothetical situation

Suppose for the moment that you are in charge of QC for a printing plant, or that you are a print buyer who is interested in making sure the proper color is delivered. Given my readership, I would expect that this might not be all that hard for some of you to imagine.

If you are in either of those positions, you are probably familiar with the phrase "ΔE", pronounced "delta E". You probably understand that this is a measurement of the difference between two colors, and that 1 ΔE is pretty small, and that 10 ΔE is kinda big. If you happen to be a color scientist, you probably understand that ΔE is a measurement of the difference between two colors, and that 1 ΔE is (usually) pretty small, and that 10 ΔE is (usually) kinda big [1].

Color difference example copied,
without so much as even dropping him an email, from my friend Dimitri Pluomidis

When a printer tries valiantly to prove his or her printing prowess to the print buyer, they will often print a special test form called a "test target". This test target will have some big number of color patches that span the gamut of colors that can be printed. There might be 1,617 patches, or maybe 928... it depends on the test target. Each of these patches in the test target has a target color value [2], so each of these printed patches has a color error that can be ascribed to it, each color error (ΔE) describing just how close the printed color is to reaching the target color.

An IT8 target

This test target serves to demonstrate that the printer is capable of producing the required colors, at least once. For day-to-day work, the printer may use a much smaller collection of patches (somewhere between 8 and 30) to demonstrate continued compliance to the target colors. These can be measured through the run. For an 8 hour shift, there might be on the order of 100,000 measurements. Each of these measurements could have a ΔE associated with it.

If the printer and the print buyer have a huge amount of time on their hands because they don't have Twitter accounts [3], they might well fancy having a look at all the thousands of numbers, just to make sure that everything is copacetic. But I would guess that  if the printers and print buyers have that kind of time on their hands, they might prefer watching reruns of Andy Griffith on YouTube, doing shots of tequila whenever Opie calls his father "paw".

But I think that both the printer and the print buyer would prefer to agree on a way to distill that big set of color error data down to a very small set of numbers (ideally a single number) that could be used as a tolerance. Below that number is acceptable, above that number is unacceptable.

It's all about distillation of data

But what number to settle on? When there is a lot at stake (as in bank notes, lottery tickets and pharmaceutical labels) the statistic of choice might be the maximum. For these, getting the correct print is vitally important. For cereal boxes and high class lingerie catalogs (you know which ones I am talking about), the print buyer might ask for the 95th percentile - 95% of the colors must be within a specified color difference ΔE. The printer might push for the average ΔE, since this number sounds less demanding. A stats person might go for the 68th percentile, purely for sentimental reasons.

How to decide? I had a hunch that it really didn't matter which statistic was chosen, so I devised a little experiment with big data to prove it.

The distribution of color difference data

Some people collect dishwasher parts, and others collect ex-wives. Me? I collect data sets [4]. For this blog post I drew together measurements from 176 test targets. Some of these were printed on a lot of different newspaper presses, some were from a lot of ink jet proofers, some were printed flexography. For each, I found a reasonable set of aim color values [5], and I computed a few metric tons of color values in ΔE00  [6].

Let's look at one set of color difference data. The graph represents the color errors from one test target with 1,617 different colors. The 1,617 color differences were then collected in a spreadsheet to make this CPDF (cumulative probability density function). CPDFs are not that hard to compute in a spread sheet. Plunk the data into the first column, and then sort this from small to large. If you like, you can get the percentages on the graph by adding a second column to the spreadsheet that goes from 0 to 1. If you have this second column to the right, then the plot will come out correctly oriented.

Example of the CPDF of color difference data

This plot makes it rather easy from the chart to read off any percentile. In red, I have shown the 50th percentile - something over 1.4  ΔE00. If you are snooty, you might want to call this the median. In green, I have shown the 95th percentile - 3.0  ΔE00. If you are snooty, you might want to call this the 95th percentile.

Now that we understand how a CPDF plot works, let's have a look at some of the 176 CPDF plots that I have at my beck and call. I have 9 of them below.

Sampling of real CPDFs

One thing that I hope is apparent is that, aside from the rightmost two of them, they all have more or less the same shape. This is a good thing. It suggests that maybe our quest might not be for naught. If they are all alike, then I could just compute (for example) the median of my particular data set, and then just select the CPDF from the curves above which has the same median. This would then give me a decent estimate of any percentile that I wanted.

How good would that estimate be? Here is another look at some CPDFs from that same data set. I chose all the ones that had a median somewhere close to 2.4 ΔE00

Sampling of real CPDFs with median near 2.4

How good is this for an estimate? This says that if my median were 2.4 ΔE00, then the 90th percentile (at the extreme) might be anywhere from 3.4 to 4.6 ΔE00., but would likely be about 4.0 ΔE00

I have another way of showing that data. The graph below shows the relationship between the median and 90th percentile values for all 176 data sets. The straight line on the graph is a regression line that goes through zero. It says that   90th percentile = 1.64 * median. I may be an overly optimistic geek, but I think this is pretty darn cool. Whenever I see an r-squared value of 0.9468, I get pretty excited.

Ignore this caption and look at the title on the graph

Ok... I anticipate a question here. "What about the 95th percentile? Surely that can't be all that good!" Just in case someone asks, I have provided the graph below. The scatter of points is broader, but the r-squared value (0.9029) is still not so bad. Note that the formula for this is 95th percentile = 1.84 * median.

Ignore this one, too

Naturally, someone will ask if we can take this to the extreme. If I know the median, how well can I predict the maximum color difference? The graph below should answer that question. One would estimate the maximum as being 2.8 times the median, but look at the r-squared value: 0.378. This is not the sort of r-squared value that gets me all hot and bothered.

Max does not play well with others

I am not surprised by this. The maximum of a data set is a very unstable metric. Unless there is a strong reason for using this as a descriptive statistic, this is not a good way to assess the "quality" of a production run. This sounds like to sort of thing I may elaborate on in a future blog.

The table below tells how to estimate each of the deciles (and a few other delectable values) from the median of a set of color difference data. This table was generated strictly empirically, based on 176 data sets at my disposal. For example, the 10th percentile can be estimated by multiplying the median by 0.467.  This table, as I have said, is based on color differences between measured and aim values on a test target [7].

 P-tile Multiplier r-squared 10 0.467 0.939 20 0.631 0.974 30 0.762 0.988 40 0.883 0.997 50 1.000 1.000 60 1.121 0.997 68 1.224 0.993 70 1.251 0.991 80 1.410 0.979 90 1.643 0.947 95 1.840 0.903 99 2.226 0.752 Max 2.816 0.378

Caveats and acknowledgements

There has not been a great deal of work on this, but I have run into three papers.

Fred Dolezalek [8] posited in a 1994 TAGA paper that the CRF of ΔE variations of printed samples can be characterized by a single number. His reasoning was based on the statement that the distribution “should” be chi-squared with three degrees of freedom. He had test data from 19 press runs with an average of 20 to 30 sheet pulls. It’s not clear how many CMYK combinations he looked at, but it sounds like a few thousand data points, which is pretty impressive for the time for someone with an SPM 100 handheld spectrophotometer!

Steve Viggiano [9] considered the issue in an unpublished 1999 paper. He pointed out that the derivation ofthe chi-squared distribution with three degrees of freedom can be derived from the assumptions that ΔL*, Δa*, and Δb* values are normally distributed, have zero mean, have the same standard deviation, and are uncorrelated. He pointed out that these assumptions are not likely to be met with real data. I'm inclined to agree with Steve, since I hardly understand anything of what he tells me.

David McDowell [10] looked at statistical distributions of color errors of a large number of Kodak QC-60 Color Input Targets and came to the conclusion that this set of color errors could be modeled as a chi-squared function.

Clearly, the distribution of color errors could be anything it wants to be. It all depends on where the data came from. This point was not lost on Dolezalek. In his analysis, he found that the distribution only looked like a chi-squared distribution when the press was running stable.

Future research

What research paper is complete without a section devoted to "clearly further research is warranted"? This is research lingo for "this is why my project deserves to continue being funded"

I have not investigated whether the chi-squared function is the ideal function to fit all these distributions. Certainly it would be a good guess. I am glad to have a database that I can use to test this. While the chi-squared function makes sense, it is certainly not the only game in town. There are the logistic function, the Weibull function, all those silly beta functions... Need I go on? The names are as familiar to me as to everyone. Clearly further research is warranted.

Although I have access to lots of run time data, I have not investigated the statistical distributions of this data. Clearly further research is warranted.

Perhaps the chi-squared-ness of the statistical distribution of color errors is a measure of color stability? If there was a quick way to rate the degree that any particular data set fit the chi-squared function, maybe this could be used as an early warning sign that something is amiss. Clearly further research is warranted.

I have not attempted to perform Monte Carlo analysis on this, even though I know how to use random numbers to simulate physical phenomena, and even though I plan on writing a blog on Monte Carlo methods some time soon. Clearly further research is warranted.

I welcome additional data sets that anyone would care to send. Send me an email without attachment first, and wait for my response so that your precious data does not go into my spam folder: john@JohnTheMathGuy.com. With your help, further research will indeed be warranted.

Conclusion

My conclusion from this experiment is that the statistical distribution of color difference data, at least that from printing of test targets, can be summarized fairly well with a single data point. I have provided a table to facilitate conversion from the median to any of the more popular quantiles.

----------------------
[1] And if you are a color scientist, you are probably wondering when I am going to break into explaining the differences between deltaE ab, CMC, 94, and 2000 difference formulas, along with the DIN 99, and Labmg color spaces. Well, I'm not. At least not in this blog.

[2] For the uninitiated, color values are a set of three numbers (called CIELAB values) that uniquely defines a color by identifying the lightness of the color, the hue angle, and the degree of saturation.

[3] I have a Twitter account, so I have very little free time. Just in case you are taking a break from Twitter to read my blog, you can find me at @John_TheMathGuy when you get back to your real life of tweeting.

[4] Sometimes I just bring data up on the computer to look at. It's more entertaining than Drop Dead Diva, although my wife might disagree.

[5] How to decide the "correct" CIELAB value for a given CMYK value? If you happen to have a big collection of data that should all be similar (such as test targets that were all printed on a newspaper press) you can just average to get the target. I appreciate the comment from Dave McDowell that the statistical distribution of CIELAB values around the average CIELAB value will be different from the distribution around any other target value. I have not incorporated his comment into my analysis yet.

[6] Here is a pet peeve of mine. Someone might be tempted to say that they computed a bunch of ΔE00 values. This is not correct grammar, since "ΔE" is a unit of measurement, just like metric ton and inch. You wouldn't say "measured the pieces of wood and computed the inch values," would you?

[7] No warranties are implied. Use this chart at your own risk. Data from this chart has not been evaluated for color difference data from sources other than those described. The chart represents color difference data in  ΔE00, which may have a different CPDF than other color difference formulas.

[8] Dolezalek, Friedrich, Appraisal of Production Run Fluctuations from Color Measurements in the Image, TAGA 1995

[9] Viggiano, J A Stephen, Statistical Distribution of CIELAB Color Difference, Unpublished, 1999

[10] McDowell, David (presumed author), KODAK Q-60 Color Input Targets, KODAK technical paper, June 2003

1. Very interesting post this week. It is making me think whether we can boil the traditionally difficult capability statistics of Cp, Cpk, Pp, Ppk, etc. down to a single number that everyone can understand.

2. This comment has been removed by the author.

3. Nice work, John. One suggestion: The methodology may be applied to the CGATS data set whereby the IT8.7/4 target was imaged by the Kodak Approval (no variation) on papers containing varying OBA (optical brightening agent). The major source of ∆E mainly comes from paper being the fifth color. The goal would be to define a generalized substrate correction model for adjusting the characterization data set aim values based on the color of the paper. This is an empirical approach and is different from DQM's linear tristimulus approach. Bob Chung

4. John, I hit the sent key too soon. There is a significant advantage to apply your methodology to solve the substrate corrected colorimetric aims (SCCA) problem over describing printing variation using a single number. This is because, as you pointed out, the method cannot predict the maximum ∆E (this is why x-bar and R charts are both needed in statistical process control). In the case of solving the influence of OBA on printed colors, the maximum ∆E (which is the difference between the white point of the characterization data set and that of the printing paper) is an input condition. Make sense?

5. Bob, Thanks for providing some direction. I had not been thinking about this in terms of OBAs, but clearly that's an issue for certification.

"The major source of ∆E mainly comes from paper being the fifth color." I have a growing collection of data sets. I can verify this. It seems like a reasonable assertion, since the printer has little control over it.

In your opinion, should the printer be penalized for OBA-induced color error?

You mentioned the tristimulus correction method. Have you seen this to be adequate for substrate correction? My intuition says that the amount of correction needed will depend on how much UV is transmitted through to the paper. This depends on the amount of coverage and on the UV transmittance of the ink that is there. Neither of these effects are explicitly in the formula, so I just wonder how much of an error this induces. Maybe it's not significant.

Let me ask for your opinion on max ∆E. Should this be one of the criteria for evaluation? From a statistical standpoint, this parameter has a lot of variation so I would be reluctant to put much weight in it. A printer could do very well with the max ∆E on one day, and fail miserably the next day.