Tuesday, December 27, 2016

Unambiguous regions in color space for the basic chromatic colors

I am going to start this blog post with the punchline. The image below shows the range in color space of the eight basic chromatic colors. I assert that any color that is within a given set of limits will be unambiguously identified by the corresponding color name by everyone except for people who are either Color Vision Deficient (CVD) or Color Naming Pretentious (CNP). 

If a color is in one of these regions, then it has an unambiguous name!

Note that this is an a*b* plot. Each color also has a viable range for L*. Stick around for the end of this post, and I will provide a simple mathematical description of these regions -- but that's a treat reserved for those people who read through this entire blog post!

Why is this important?

Before I go any further, I have a confession to make. I write this blog post (and put all that time into the data analysis) in hopes that I will someday win this running argument that I have with my wife. I know. Good luck, John. I can dream, can't I?

Here is how the argument typically plays out...

Math Guy: Did you see that woman with the gorgeous brown hair? She just winked at me and smiled. I'm gonna go ask her for her number.

Honey, she smiled at me!

Bride of Math Guy: Don't be such a dufus! Her hair is auburn, not brown! 

Math Guy: But, Honey! I know color. I am a color scientist!

Bride of Math Guy: That may be, but you're still wrong.

If only I could walk over to the brunette (who is obviously attracted to immensely intelligent guys like me), ask to measure her hair with the spectrophotometer that I always carry in my pocket, and then bring up a ColorNamer app to give me the unambiguous name for the color of her hair. If nothing else, you gotta admit that this is a novel pick-up line. And just maybe, it could be used to avoid marital strife.

If you have doubts about the importance of the question of color naming, witness the following. There is a prominent, well-respected, and humble color scientist/blogger who has devoted no less than six blog posts to the topic of assigning names to colors.

I dunno, maybe the name of a color is important for other reasons. I mean, there are a few odd cases where words are used to convey information, and the color associated might be important to somebody.

My data

I recently ran into a pile of papers written by Dimitris Mylonas. Unlike me, he has been doing a lot of real research. His research topic for his doctorate at University College London has been how people assign names to colors. He has been running an online experiment where he displays colors right on your own computer and then asks you to name the color. He has made the results available through an online color naming app where you can select from 30 color names and it will display the most common color associated with that name. Or, the other way around, you can click on a color from a palette, and you will get a word cloud with the most common names that your RGB combination has been given. Great entertainment for a rainy day. I gave up my subscription to Netflix when I found this.

Screen capture from Mylonas' site

There was a similar color naming experiment that was conducted by Nathan Moroney of HP. You can get a free copy of his color thesaurus online.

Snippet from Moroney's book

For both of these sources (the online app from Mylonas and the book from Moroney), I harvested RGB values for each of the basic chromatic color names: red, orange, yellow, green, blue, purple, pink, and brown.

Why these colors? 

Why not beige, turquoise, plum, coral, lilac, etc?

I do have some logical foundation for the colors I chose. It is based on a seminal paper in the study of chromolinguists by Berlin and Kay in 1969. They did linguistic studies of color names in eleventy-seven zillion different languages and came to the following conclusion:

"... a total basic inventory of eleven basic color categories exists from which the eleven or fewer basic color terms of any given language are always drawn. The eleven basic color categories are white, black, red, green, yellow, blue, brown, purple, pink, orange, and grey."

The eleven basic color categories

I think that's pretty amazing. There are many independent roots of languages, and for some reason, they eventually all settle on eleven words for basic color names. The words are different, of course, but they all kinda translate. You don't run into a basic color word in Swahili that translates into "a sorta brownish shade of red, but not so dark". There must be something fundamental to the human eye or the neural pathway to the human brain that segregates color into these eleven groups.

I should mention here that the bulk of the Berlin and Kay paper dealt with a recurring pattern in the development of languages. They posited that nascent languages include white and black in their vocabulary, later adopt a name for red, then either yellow or green, followed by green or yellow, etc. The sequence up to these eleven colors is largely predictable.

There has been much research based on the work of Berlin and Kay, and it mostly supports the eleven-ness of color categories. Perhaps there are a few colors (such as beige, turquoise, plum, and coral) that belong on the next tier, but these are clearly the Magnificent Eleven.

In the old west, life was lived in black and white,
and there were only seven magnificent basic colors

I did a little tiny bit of research on this topic. Years ago, I taught algebra to people who hated math for University of Wisconsin Milwaukee. One semester I had about 50 students, half male and half female. I asked them to write down all of the single-word color names they could think of, and gave them two minutes. 

There were perhaps three or four lists that had only ten of the Magnificent Eleven, but almost every student had included the eleven basic color terms. The next two color names in terms of frequency, were silver and gold, which each appeared on about half of the lists. That in itself I found interesting, since as a color scientist, I know that silver and gold are gonio-apparent effects, and not actually colors.

My paltry little experiment demonstrated once again that there is something magical about these eleven colors.

So, for this experiment today, I decided to go with that set of eleven. But I left off the achromatic colors (white, black, and gray) due to some technical problems beyond my control. I didn't think that neutral colors were pretty enough.

A caveat

There is always a caveat, isn't there? These two online experiments are absolutely fabulous work. Incredible amounts of data. In one of his papers, Mylonas states that there had been over 1,400 participants in his experiment, Moroney claims over 5,000. 

Here's the caveat, though. You can bet that most of the computer displays were uncalibrated. There were certainly 6,400 different viewing conditions, if you consider intensity and white point of the display, ambient light, and background. So when Sidney from Sidney looked at a shade of lime green that was created by the RGB values 30, 230, 40, and Charlotte from Charlotte viewed that same combination on her monitor, there is apt to be a difference in what they are actually seeing. Anyone who has used a laptop with a second monitor can appreciate this issue. 

So, we identify that this is a source of experimental variation. We have literally scads of data, but are unsure about the quality of the data. But, I used it for my experiment anyway. I'm not proud.

Both researchers provided us only with RGB values associated with the color names. Before I go on, I should explain that "RGB" is not a standard. It could refer to any of the particular flavors of RGB values associated with whatever monitor or cellphone or camera you are using. But adding an s to the front of RGB wildly changes the connotation of the whole thing in much the same way that adding a little s does to your ex. sRGB is a unique standard that can be converted into the standard L*a*b* values. That would be a handy trick right now.

I asked my good buddy Dimitris Mylonas if it would be reasonable to assume that his RGB values were sRGB. I could almost hear him shrugging his shoulders through the email: "sRGB is safer than any other assumption." So, I used the standard computation to convert from sRGB to L*a*b*. Here is a website to do the conversion from sRGB to L*a*b*.

An obvious question here

I can see one of you bouncing up and down with your hand in the air... yes? You want to know whether the two data sets agree with each other? Good question! I wish I'da thought to do that! 

In the graph below, the circles represent the Mylonas data, and the squares with an outline represent the Moroney data.

Comparison of two versions of the basic colors

So, the answer is no. They don't match. The color difference ranges from 12.5 DEab to 46.4 DEab. Not good matches by any stretch of the imagination. There is a consistent pattern, however. The Moroney data is always more saturated. And the two data sets have very similar hue angles. With the exception of yellow and blue, the hue angles are all within  of each other. 

I don't have a ready explanation for why the two experiments differed so much. Given the size of the data sets, the difference is due to some sort of bias between the two experiments, and is not a statistical anomaly. If nothing else, this is a caution for this endeavor: If we try to precisely define colors, there be dragons.

The eight color map problem

So here I am: I have L*a*b* values for the eight basic chromatic colors, but, truth be told, these numbers have somewhat of a checkered past.

Enter Sturges and Whitfield

If only I had some data that was taken under standardized conditions. Even if it were to be done with less than a cast of thousands, if it corroborated the online studies, then the online studies would be corroborated.  The good news is that such a study was done in 1995 by Sturges and Whitfield. They elicited the help of twenty students at their university in England. Half were male and half female. None had any specialized training in color, err, excuse me, in colour.

The experimenters selected 446 color chips from the Munsell Book, and asked each subject to give a monolexic name for the color of the chip. (From mono, meaning "the kissing disease", and lexic meaning "someone who can't remember the first three letters of their learning disorder", monolexic means "single word". Hence pink, aubergine, and sploofrinde are all monolexic. Burnt sienna and reddish-green, are not monolexic. Owyell, by the way, is dyslexic, and there are no English words that rhyme with orange. Except for sporange, which means "word that rhymes with orange".

Did I mention? Twenty subjects named 446 randomly ordered patches, and I forgot to mention that they were given each patch twice.

S&W thus had a lot of data to distill down -- about 18,000 words. Among other things, they tried using consensus to decide when a given name was proper for a given patch. If all twenty trials on a given patch yielded the same name, then there was consensus. I was a bit surprised but even with this stringent test, there were 102 patches where there was a consensus as to the name. For about one-quarter of the patches, twenty people independently came to the same conclusion about the monolexic name.

So, my third data set was this set of 102 colors and their associated names. Since the colors were reported in Munsell notation, I used the Munsell renotation data to convert to L*a*b*.

Am I done yet?

Just to be safe, I wanted to throw in a few more data sets. I happened to have measurements from a Macbeth Color Checker chart. (This shows my age. People who don't immediately recognize the names "John, Paul, George, and Ringo" will know the Color Checker by the name X-Rite Color Checker.) This chart unfortunately does not include pink or brown.

And I did wy own color naming experiment. I tossed a Pantone book at my wife and asked her to find the best representation of each of the basic color names in the book. Her brother, who is also color-savvy, was given the same test, and I recorded my answers as well. (In case you were wondering, we disagreed on pretty much all of them.)

Here are the results for the color purple. Each dot represents a "sample". Thus, the 5,000 people who took the Moroney test get one circle. The twenty college students who spent the better part of a weekend staring crossed-eyed at Munsell chips instead of going out to a proper pub got a total of fourteen circles -- one for each Munsell patch that they all agreed was pruple purple. And I got one circle all to myself. And, begrudgingly, I gave one circle to my wife and brother-in-law as well. Life isn't always fair. If one of the people who took the Mylonas test wants more than 1/1,400th of a circle, they can get write their own darn blog post.

The X in the middle is the average of all the data.

The range of the color purple

Looking at the scatter of points of purple and of other colors, I saw a shape that was bounded on two sides by two hue angles, on two other sides by two chroma values, and (not shown) bounded on top and bottom by two L* values. Since I had some clear outliers, I let Excel tell me the 10th percentile and 90th percentile of  L*, C*, and h.

Note: I had originally called this last one H*. Thank you Tammo for the correction!

Results, in graphical form and numeric form

The graph below may look familiar to those who bothered to read the first part of this blog, and who were also paying attention. It is the same graph as above, meticulously duplicated for the benefit of my dear readers.

Partitioning of color space into base color names

Low L* High L*  Low C* High C* Low h High h
Red    41    49     59    86     27    37
Orange       62    72    67    96    57    67
Yellow    81       90    68   109    86    97
Green    31    72      29    80   122   168
Blue    31    71    24    58  -112   -71
Purple    25    52    26    81   -56   -35
Pink     62    81    25    54   -23    21
Brown    29    41    26    43    55    76

Let me know if you find some use for this. I found some use... I wrote a blog post, and looking at this graph gives me a bunch of ideas for future blogposts.