Thursday, February 7, 2013

A spectrophotometric romance

This is a romantic love story. The usual... boy spectrophotometer meets girl spectrophotometer. Sparks fly, and naturally, they fall in love. I haven't cast the parts yet, but I am thinking Jennifer Aniston could play the female lead. My wife might like to see Antonio Banderas as the male spectro.


But like all romantic comedies, there has to be a conflict. In this movie, they start to disagree.
This part of the movie is quite familiar to me. As a guy, I see this disagreement with my little spectrophilia on a daily basis. Is the blouse on that cute young lady teal, aquamarine, cyan, or turquoise? No matter what I say, I know there will be an argument. As a mathematician, I can readily calculate that my odds of winning the argument are no better than 0 in n, where n is a really big number. I mean, really big. Like so close to infinite that you can taste it. 

But as a color scientist with an ego the size of the planet Jupiter, it's hard for me to just let this go. I should know my colors, right?!?!?!??

So I can relate, as can any male, married, applied mathematician color scientists. I think this covers just about everyone who reads this blog.

One would think that they would agree. They are in love, of course. There are these expectations. IFRA published a report [1] on this expectation:

Inter-instrument agreement is usually indicated by a colour difference value between two instruments or between a master instrument and the average of a group of production instruments. Although various ways are used to describe this colour difference, a common value is the average or mean value for a series of twelve British Ceramic Research Association (BCRA) Ceramic Colour Standards Series II (CCS II) ceramic tiles. A value of 0.3 ΔEab is acceptable. 

How much do our hapless lovers disagree? I did a little research. I went digging for technical papers and reports where others had brought spectros together to see how much they agreed - to assess inter-instrument agreement.

Study
Number of Instruments
Samples Measured
Reference
Errors
Nussbaum [2]
9
BCRA Tiles
NIST Standard
8 of 9 >2.0 ΔEab
Radencic [3]

8
Two each from four different manufacturers
Lab-Ref card
Median of all instruments
All >1.0 ΔEab,
Max. 10 ΔEab
Wyble and Rich [4]
3
BCRA tiles and ink
Paired comparison
Avg. 0.73 ΔEab to 1.68 ΔEab
ICC [5]
3
Three units of the same model
Gravure printing
Identical model
Avg. 0.47 ΔEab,
Max. 1.01 ΔEab
Dolezalek [6]

3
46 patches
5 stocks
Paired comparison
50% >1  ΔEab,
20% >2 ΔEab
Hagen [9]

20
Field study of in-use instruments
13 patches
GretagMacBeth NetProfiler card
Avg. 1.56  ΔEab,
Max. 3.77  ΔEab
X-Rite [10]
6
One of each of their models
46 patches
9 substrates
Paired comparisons
0.27  ΔEab to
1.08  ΔEab

Looking at the far right column of this chart, it is clear that there are virtually no spectrophotometers that are acceptable by the criteria set forth by IFRA. (Understatement alert) There appears to be something of a disconnect between the expectation of inter-instrument agreement and the actual disagreement that will be seen.

What to do?

I turn to a couple of my friends, Danny Rich [9] and Harold Van Aken [10]. (I was honored to be present last night when Danny received some prestigious award or other for lifetime commitment to blah blah influence on the industry blah blah blah best screen adaptation of a spectrophotometric calibration method.... whatever. The award was prestigious anyway. Tears, laughter, speeches. I am not jealous, by the way. Not trying to put him down. Honest. No, I mean really. [11])

The idea put forth by these two really smart guys is that at least some of the discrepancies between spectrophotometers are due to understandable and predictable phenomena. If the understandable phenomena can be quantified, then they can be corrected.

Here is where the BCRA tiles show up in the movie. I am sure everyone has been expecting this. Who says romantic comedies are predictable. If I have any say in the casting for this movie, I would have George Clooney play the part of the BCRA tiles. He would play a therapist, and would try to help are two hapless spectros to reconcile.
George "BCRA" Clooney does all the usual psychotherapeutic stuff, and there appears to be some agreement on the difference between beige and taupe. But, alas, the improved relations falter and once again the couple are disagreeing, in some cases louder than before. This is a totally unexpected turn of events in a romantic comedy, right?

The table below shows what happens when the BCRA tiles are brought in. Before standardization on the BCRA tiles (this is a fancy word for what us plebeians call calibration) we see median agreement of 0.35 ΔE, 0.66 ΔE, etc. on the four different test sets. 90th percentiles are in parentheses below. After standardization, we see that the 90th percentile agreement of the two instruments is much better than before - on the BCRA tiles, going from 1.84 down to 0.95 ΔE. 

But the other sets of samples? Not much improvement at all. The paint samples that were measured (the Behr samples) actually got much worse in the 90th percentile. Much worse,


Test set



Regression set
BCRA
Pantone primaries
Pantone ramps
Behr ramps
Before standardization
0.35
(1.84)
0.66
(1.69)
0.49
(1.60)
0.63
(1.72)
BCRA
0.41
(0.95)
0.53
(1.80)
0.50
(1.31)
0.60
(2.76)


So. George Clooney, favored because he is the analytical psychotherapist, and because he is just a darn sexy guy, has failed. Now we get the unexpected twist that is to be expected in all romantic comedies. Enter Owen Wilson, dufus extraordinaire.
Owen plays "Behr". He plays a dufus ne'er do well. In his normal inept way, he proves himself to be fully ept. in getting the pitiable instruments together. The tie in to the serious side of this blog is set of paint samples. I walked into a Home Depot. Please don't let them know, but I was just pretending to be buying paint. I looked at the Behr paint samples and selected a set based on being cute. (Get this... I selected Owen "Behr" Wilson because he is cute.)
The 24 colors in the Behr paint samples

The table below (bottom row) shows the results from using the Behr paint samples to standardize the instruments. Note that the worst case examples are all more better, and most are way more better.


Test set



Regression set
BCRA
Pantone primaries
Pantone ramps
Behr ramps
Before standardization
0.35
(1.84)
0.66
(1.69)
0.49
(1.60)
0.63
(1.72)
BCRA
0.41
(0.95)
0.53
(1.80)
0.50
(1.31)
0.60
(2.76)
Behr ramps
0.50
(0.81)
0.66
(1.26)
0.44
(1.07)
0.11
(0.26)

So, thanks to the help of Owen "Behr" Wilson, they lived happily ever after.

Scientific conclusions

Ok, now for something completely different. This is the serious part.

First, before Home Depot has a run on samples of the pretty color set, let me say that the set I chose was not scientifically chosen. In a totally uncharacteristic way, I actually told the truth about just picking out the samples based on being pretty. The set was nowhere near perfect. I am sure it could be optimized to make it smaller and more better. I am guessing this might happen.

Second, note that the improvement in inter-instrument agreement is not fabulous. I am guessing that better agreement might not be possible. Sorry.

Third, this experiment is a practical example of a point I made in a previous blog. Regression can go bad if you try to push it too far.

This blog post is derived from my paper, "Evaluation of Reference Materials for Standardization of Spectrophotometers", presented earlier this week at the Portland TAGA conference.

----------------------------------
[1] Williams, Andy, “Inter-instrument agreement in colour and density measurement”, IFRA special report, May 2007

[2] Nussbaum, Peter, Jon Y. Hardeber, and Fritz Albregtsen, “Regression based characterization of color measurement instruments in printing application”, SPIE Color Imaging XVI, 2011

[3] Radencic, Greg, Eric Neumann, and Dr. Mark Bohan, “Spectrophotometer inter-instrument agreement on the color measured from reference and printed samples”, TAGA 2008

[4] Wyble, D. and D. C. Rich, “Evaluation of methods for verifying the performance of color-measuring instruments.  Part 2: Inter-instrument reproducibility”, Color Research and Application, 32, (3), 176-194

[5] ICC “Precision and Bias of Spectrocolorimeters”, ICC white paper 22

[6] Dolezalek, Fred, “Interinstrument agreement improvement”, Spectrocolorimeters, TC130, 2005

[7] Hagen, Eddy, “VIGC study on spectrophotometers reveals: instrument accuracy can be a nightmare”, Oct 10, 2008, http://www.ifra.com/website/news.nsf/wuis/7D7D549E8B21055CC12574C0004865FC?OpenDocument&0&

[8] X-Rite, “The new X-Rite standard for graphic arts (XRGA)”, CGATS N 1163

[9] Rich, Danny, “Graphic technology — Improving the inter-instrument agreement of spectrocolorimeters”, CGATS white paper, January 2004

[10] Van Aken, Harold, and Ronald Anderson, “Method for maintaining uniformity among color measuring instruments”, US patent 6,043,894

[11] All kidding aside, Danny is a great guy and has been a mentor to me. I am proud to be able to call him a friend. Here is an announcement of Danny being awarded the Robert F. Reed Technology Medal.



5 comments:

  1. Great post, John. Many users see a published inter-instrument agreement as the absolute limit of performance. Unfortunately, "typical user experience may vary."
    Nearly all instrument manufacturers take pains to describe their methodology for determining inter-instrument agreement, saying that the IIA figure is a STD DEV of average measurements in a fleet of instruments compared to a master (usually virtual) at precise temperature and humidity. Very few user environments match these conditions.
    Just a note on your experience with standardizing the instruments with BCRA tiles versus Paint samples: While there are some advantages to using paint samples over ceramic tiles, notably less thermochromatic shift and flatness (BCRA tiles are not as flat, so some variation there affects 45/0 instruments,) Ceramic tiles are much more durable as a long term standard. Paint on paper would ideally only be a one time use, while a properly cared for set of ceramic tiles will be usable for a year. Repeated use of the paint on paper sets does degrade the samples over a relatively short period of time. Although I never looked into the cause, I suspect that humidity changes and scuffing from repeated physical contact with the instruments are responsible for the color shift.

    In brief, paint on paper samples are a good way to check IIA on a small set of instruments if you measure them all at one sitting, but continued use of a standards set really requires ceramic.

    ReplyDelete
  2. Hey Math Guy!! Great Blog! I always was a sucker for a good romance tale! I think the female spectro looks more like Selma Hayek than Jennifer Aniston, though!
    Hey you users out there, did you take note? YOU NEED TO KNOW about differences between instruments, not just how each one tracks between factory calibrations! Get one of the references John mentions above and be aware of differences. Noting spectral differences between instruments can help you win those arguments that John describes as unwinnable above. Nah, that's impossible! Data never won an argument with a woman!

    ReplyDelete
  3. Good work John.
    Be a user and not a scientist, I´m discussing with several people the issues on inter instrument agreement. I agree with Bob k., that ceramic tiles are far more stable than pint samples.
    What I learned during my discussions, that some of the ceramic tiles especially the orange tile are reflecting light quite different to paint or print samples. If you compare instruments with differnt construction of the light to detection path - e.g. Techkon and X-Rite, ceramic tiles can be misleading for judging inter instrument agreement for e.g. print evaluation.
    I am asking X-Rite since years to offer Netprofiler for the i1, but they refuse this idea.

    ReplyDelete
  4. Knowledgeable with a sense of humor. Now that is my perfect pair and you've got it in spades...or should that be hearts.

    ReplyDelete
  5. Thank you Kate... hearts did you say? Have a look at my blog post tomorrow (Feb 13).

    ReplyDelete