John the Math Guy

Wednesday, September 25, 2013

A YouTube tour through Brazil

As they say in Monty Python land, "and now for something completely different." Enough of this silly math and color science stuff, ok? How about something fun for a change?

My wife and I got a Brazilian earworm the other day [1]. An earworm is one of those catchy tunes that gets stuck in your head. Since my wife and I are both susceptible to earworms, and both like to sing, and spend a lot of time together, we generally spend weeks re-infecting each other with earworms. The only way to get rid of an earworm in our household is to introduce another.

The earworm was the song "Brazil".

Here is a version of the song which I would call quintessential. And I say that not as a real expert, but as a guy who likes the song, and who once had a collection of over twenty eight track tapes, so I consider myself an expert musicologist. The music in this video is hot, the tempo is fast, and its got lots of instrumental things going on. You know, like a piano and stuff. Maybe strings and those things that you shake to make shikka-shikka sounds.

This version was from the movie The Eddy Duchin Story, which was about a band leader from the 30's and 40's. I forget the band leader's name, but the movie is his story. You can buy the soundtrack of the movie from Amazon.

This is not the original version of the song, of course. That would have been back in 1939, when it was written one stormy night by Ary Barrosa. The first recording of it was by Francisco Alves, which I have painstakingly cut and pasted from YouTube below. Comparing this to the version from the Eddy Duchin movie, you will note two things. First, it's slower - still a reasonable tempo, but not frenetic. The second thing is that there are words! The cool thing is that the words are in Portuguese, which (in my opinion) is an absolutely gorgeous, sensual language. I am sure if I knew how to speak Portuguese, I would have better luck with the women, if I were single. And, of course, if I chose to use that secret weapon.

As lovely as this version is (in it's tinny kind of 1939 recording quality way, and with its 1939 kind of vocal stylings), the version that put Brazil on the map [2] was an animated version by Disney. Donald Duck is in the cartoon, but the singer is a parrot by the name of Joe Carioca [3], as sung by the real life human Jose Oliveria.

Donald and Joe

Here is the YouTube video. I recommend watching it from one end to the other. The idea of animating the painting of the scenes with a paintbrush was truly inspired. The inspiration actually comes from the full title of the song, "Aqualera do Brasil", which means "Watercolor of Brazil". Barrosa came up with the name when he looked at a watercolor painting by that name.

There are 2^n (cos (zillion) + exp (umpti-leven/2)) versions of this song on YouTube. I listened to quite a few, but I must admit, my research was not exhaustive. Still, here are some that stuck out in my mind.

I start off with one of the absolute worst versions, as performed by the Ray Conniff Singers. Sorry, Ray, this version just leaves me flat. They slowed it down and turned it into a 1970's fashion statement. I am the singer on the far left, by the way. I still have the bow tie, but my barber left town.

Contrast this against the Lawrence Welk version from 1959. Now, I grew up being indoctrinated in the belief that Lawrence Welk was totally unhip "Gramma music". If you were indoctrinated the same way, this clip might change your mind. Electric guitar, muted horns playing a crisp staccato, joined by flutes and clarinets... all at an electrifying tempo. And what would a Lawrence Welk song be without the accordion of Myron Floren? Seriously, this rocks.

Just to be clear about my own tastes, the song doesn't have to be performed fast to make me long for the sandy beach and skimpy bikinis of Rio. Here is a version from old Blue Eyes that pops my cork. Note that he never bothered with the Portuguese lyrics [4]. I guess he figgered he was sexy enough already.

And here is another laid back version that I can't help dooby-do-wopping along to.

Finally, I provide a version by the Slovenian vocal group Perpetuum Jazzile. (See if you can find me in the second row.) I can't watch this video without wondering what kind of volume discount [5] they get on the Shure SM58 mics. And the volume discount on Prozac.

----------------------------------

[1] I know what you are thinking. Not that kind of Brazilian! An earworm, silly.

[2] "Put Brazil on the map" Isn't that a cute little pun?

[3] This last name sounded to me a lot like "karaoke", so I had to dig around a bit. Is there any connection? Well, of course not. "Carioca" is a Brazilian word referring to people from Rio de Janeiro. The word "karaoke" is a Japanese word meaning, literally "empty orchestra". No connection other than the serendipitity of two words sounding similar and relating to singing.

[4] Here are the English lyrics that Sinatra sings:

Brazil, where hearts were entertaining June
We stood beneath an amber moon
And softly murmured someday soon
We kissed and clung together

Then, tomorrow was another day
Morning found me miles away
With still a million things to say
Now, when twilight dims the sky above
Recalling thrills of our love
There's one thing I'm certain of
Return I will to old Brazil

(instrumental)

Then, tomorrow was another day
Morning found me miles away
With still a million things to say
Now, when twilight dims the sky above
Recalling thrills of our love
There's one thing that I'm certain of
Return I will to old Brazil
That old Brazil
Man, it's old in Brazil
Brazil, Brazil

[5] "Volume discount" - another incredibly subtle pun.

Thursday, September 12, 2013

How many colors are there - the definitive answer

Here is the quick summary: There are 346,005 discernible colors.

I published a rather popular blog post in October 2012 that posed the slightly whimsical question "How many color are in your rainbow?" I looked at the question in a number of ways and came up with answers anywhere from 3 to 16,777,216. The intent of the post was to collect some interesting facts into one coherent and entertaining post.

But, I sort of sidestepped the implied question: How many discernible colors are there? I intend in this post to answer the question a bit more scientifically. Well... quite a bit more scientifically. For this blog post, I used a Monte Carlo technique to determine the volume of CIELAB space, and then modified this volume according to DE₀₀to account for the nonlinearity of CIELAB. And the number 346,005 plopped out.

General idea

I started by generating zillions of random spectra [1]. Now, I'm not gonna say that I generated every spectra possible. I'm sure I missed a few that were hiding down there in the shadows. But I did look at a whole bunch of them. Half a billion, to be exact.

I converted each spectra into L*a*b* values using D50 illumination and the 2° observer. The resulting L*a*b* values were tabulated into boxes in a three-dimensional array, with each box indicating whether the corresponding region in CIELAB space contained a viable L*a*b* value.

Next, I counted up the number of boxes checked to establish the volume of CIELAB space. According to my experiment, the volume is just short of 2.2 million. This number fits in reasonably well with two papers cited by Gary Field in my addendum blog post:

Research on the number of colors issue usually starts with reference to the Dorothy Nickerson and Sidney Newhall paper of 1943 (JOSA, pp. 419-422). They conclude that there are about 7,500,000 surface colors at "supraliminal" viewing conditions, and 1,875,000 colors when viewing conditions approximate those used for color matching work.

Mike Pointer and Geoff Attridge concluded that there were about 2,280,000 discernible colors in their 1998 CR&A article (pp. 52-54).

Thus, my number (2.176 million) corroborates the previous results from Nickerson and Newhall (1.875 million), and from Pointer and Attridge (2.280 million).

But we're not done yet. As we know, CIELAB is just not all that uniform. In particular, two saturated yellow colors might be 5 units apart (according to DE_ab) but might still be perceived by a human as just barely different. Thus, this figure is an overestimate of the number of colors that are actually discernible. Since I have all (or nearly all) the physically realizable colors in boxes, I can compute the volume of each box using DE₀₀. Adding the DE₀₀ volumes of each of the boxes will provide an estimate of the true number of colors, corrected for visual linearity.

Based on this correction, the number of discernible colors is 346,005. I won't attempt to name them in this blog post. That will come in a future post.

Now for some details on how the calculation was done...

Generating spectra

All the spectra were "physically realizable reluctance spectra", which is to say, the reflectance values were all between 0 and 100%. I created spectra from 380 nm to 730 nm, in 10 nm increments. All the spectra I generated were somewhat "smooth", in that they were piece-wise linear functions. I show one example below.

The spectra above is comprised of nine segments. I generated 125 million of these nine-segment spectra, along with the same number of spectra with eight segments, the same number with seven segments, and the same number with six. Thus, there were 500 million spectra en toto.

Initially, I used reflectance values that were uniformly distributed between 0 and 100%. This proved a bit slow to converge (slow to fill the area), since a lot of spectra were generated at the light end where our sensitivity to color difference is rather weak. For this final work, I used random numbers distributed according to the cube root distribution.

Caveat - This Monte Carlo analysis necessarily will produce only a subset of all possible spectra. First, discontinuous spectra were left out. Second, the fact that "only" half a billion spectra were analyzed leaves open the possibility that some are missed. This would tend to cause my estimate to be a bit low.

I also tried generating purely random spectra, with no correlation between wavelengths. Initially this was slow to converge - perhaps that might have worked out in the long run if I would have just had the patience.

Tallying the number of unique colors

A three dimensional array was created, representing L* values from 0 to 100, a* values from -150 to + 150, and b* values from -150 to +150. All three dimensions were quantized in steps of 5, resulting in 21 X 61 X 61 boxes. Thus, there was a single cube, for example, in CIELAB space representing all colors in the range 20 < L* < 25, -40 < a* < -35, and 80 < b* < 85.

Each of the spectra were converted to CIELAB values using the D50 light source and the 2° observer. The CIELAB values were then converted to a position in the three dimensional array, and the location was marked to indicate that there was at least one viable CIELAB value within the box.

If anyone is interested, I can send you a list of the centers of all the boxes, representing all valid CIELAB colors. Send me an email at john@johnthemathguy.com. If anyone is really interested, I can provide a set of very colorful charts like the one below, that summarize all this data. If enough people are really interested, I will post those to my website for all to marvel at.

Viable a*b* values in the range 55 < L* < 60

(each square represents a 5 X 5 box in a*b*)

Caveat - This discretization causes a bit of an error. It will cause an over-estimation of the number of colors. Why? Let's say that a certain box is at the edge of color space, straddling the line between viable CIELAB values and silly-lab values. If zillions of spectra are tested, then this box will eventually get a tally, despite the fact that only half of its volume should have been counted.

Converting to a count of discernible colors

Now for the novel part, the conversion to DE₀₀. In the previous analysis, the tacit assumption was made that each of those 5 X 5 X 5 boxes had a volume of 125. To be completely correct, the volume of each cube is 125 ΔE_ab³, cubic delta E units. I guess maybe that's not an assumption, that is pretty much just geometry. The assumption comes in when this is interpreted as meaning that each box contains 125 discernible colors. Those who have subscribed to the Color Science Times Newspaper for the last 30 years know that this might not be exactly the case.

So, I computed the volume using a color difference formula that is closer to human visual perception, ΔE₀₀. Theoretically, we could just compute the volume of a box by determining the color difference from top to bottom, from side to side, and from other side to other side. These three numbers would be multiplied together to get the volume of discernible colors in that box. This is reasonable, but it falls just outside of the spec for this color difference. Due to nonlinearity, the warranty on ΔE₀₀ expires at 4. Beyond that point, it may not give reasonable results.

Just to make sure this didn't introduce an error, I divided the cube into eight cubes, each with sides of 2.5 ΔE_ab, and added these up. Now we are within the warranty.

346,005. I'm going to use this for all my computer passwords. Just to make sure I remember it.

---------------------

[1] I am talking here in the first person, like I actually generated all the spectra myself. I didn't really. I have better things to do. Like drink beer. I had my assistant Dell Studio generate the spectra. He didn't seem to mind, although he did seem to take his time about it.

Wednesday, September 4, 2013

Mixing my ink with my beer

I have had a lot to say in the John the Math Guy blog about beer. There was a recent post about ruminations on beer, but the key post on beer, my seminal post on beer, is the post where I cleverly used beer to illustrate Beer's law. I keep going back to that one because I just can't get over how brilliant the idea was.

I have referred to this Beer's law post in heaps and gobs of other posts:

Where are my CIELAB knobs? [1]

Why does my cyan have the blues [2]

Why does my cyan have the blues? addendum

The color of a bunch of dots, Part 4

Density is ink film thickness

Green ink being shamelessly added to beer

Several people have asked questions on this Beer's law thing, and how it connects with ink. Way back in January (2013), I got an email from a PhD student in the UK:

When I scoured the internet for the derivation of this equation I only found the original equation based on absorption coefficient, path length and concentration.

I would like to understand where the alternative equation is coming from. Could you point me in the direction of a useful paper or similar? Any help would very much be appreciated. Thanks a lot in advance.

Kind regards,

Anja

I just recently got a similar question from Michael, who is not a PhD student in the UK, but is nonetheless a smart guy. He just aced the Science and Technology Quiz that was put together by Smithsonian magazine and the Per Research Center. This is quite an accomplishment. I'm proud to be a "virtual" friend of yours, Michael.

I thought Beers law was related to transmission of light through something, not reflected - but, well, same difference ?

Michael's question came to me on through that website that everyone uses for scientific collaboration.: FaceBook. If you haven't heard of it yet, I suggest you check it out. That's where I do most of my serious research.

The plethora of questions (there were two...) show that I have clearly messed up big time in my desire to educate the world about ink and beer. I left one little step out, the mixing of ink and beer. Just how is it that Beer's law applies to ink?

Beer (on the left) and ink (on the right)

Ink is soooo not like paint

One of those wonderful things that we can count on in this world is that paint is not like ink. Oh... they may seem the same to the untrained and unscientific eye. You put them on something and it changes the color. But there is one key difference, as illustrated in the image below.

This time, paint is on the left and ink (on the right)

This image was created by smearing ink and paint on a sheet of paper [3]. Before smearing, a large black area was printed on the sheet. Note that on the left, the paint completely obliterates the black underneath. Paint has a great hiding power, at least when you pay more than $8 a gallon for it. The ink, however, does a perfectly lousy job of hiding the black. You really can't tell that the yellow ink is overneath the black.

What gives? Is ink just really, really cheap paint? Oh contraire! Let me assure you, ink does a pretty decent job of doing exactly what it was trained to do. And paint also does a pretty decent job at what it was trained to do. That is, if you aren't cheap like me, buying the ultra-cheap paint at $8 a gallon from Fast Eddie's Paint Emporium and Car Wash.

The actual photomicrograph below illustrates what an ideal cyan ink is trained to do. Red, green, and blue light hit the surface of the ink. [4] As can be seen, the blue and green light go right through the cyan ~~filter~~ ink. The ink is transparent to green and blue light. These two flavors of light hit the paper (or other white print substrate) and reflect back. Why? Cuz the substrate is white, and that's what white things are trained to do.

Cyan ink, sitting contentedly on paper while being bombarded with red, green, and blue light

The red light suffers a completely different fate. For anyone who has visited a red light district, this should be no surprise that one's fate may change. The red light is absorbed by the cyan ink. Few of the poor hapless red photons ever even get a chance to reach the paper, and even fewer make it through the ink in the hazardous journey back through.

I may have shattered some illusions about ink here. I apologize, but it's time you learned the facts about the birds and the bees and the inks. It is customary to think of light just reflecting off the ink. Sorry. It's more complicated than that. The only reflecting that's done is done by the paper.

Ink is a filter, a filter laid atop the paper.

Why is ink that way?

This bizarre behavior is not just some side effect of some bizarre organic chemistry that is only understood by some bizarre color scientist locked in the lab at Sun Chemical. This bizarre behavior is a property that is specifically engineered by some some bizarre color scientist locked in the lab at Sun Chemical.

To see why this would be a good thing to engineer in, consider what happens when magenta ink is placed overneath cyan ink. Magenta ink works a lot like cyan ink, except that it absorbs green light and passes red and blue. The excitement starts when you put on ink on the other. The cyan ink absorbs red light and the magenta ink absorbs the green. What's left? Just blue light.

Magenta ink, sitting contentedly on cyan ink

The exciting part of this is that new colors are created. We start with cyan, magenta, and yellow inks. By putting one ink overneath another, the additional colors red, green, and blue are created. Try doing that with paint! It ain't gonna work. The paint on top defines the color, hiding whatever is below.

This feature of ink is what allows us to have a much wider gamut. With three inks (cyan, magenta, and yellow) we can theoretically get eight different colors: white (no ink), cyan, magenta, yellow, red, green, blue, and black (all three inks).

Getting a bit more quantitative

I need to put some numbers on this if I'm going to get Beer's law involved. I painted a rather black and white picture of cyan. Well, ok, I should say that I inked a black and white picture rather than painted it. And black and white aren't quite the correct colors. But, the point is, inks are not perfect. Cyan does not capture all the red photons. Nor does it pass all the green and blue photons.

A typical cyan ink might allow 20% of the red photons to pass through on their way to the substrate. That is, 80% of the red light gets absorbed and the other 20% makes it down to the paper. Let's just assume that all of those photons reflect from the paper. (I am telling a little white lie here, but it's for a good purpose.)

Ok, so if we start out with 100 red photons heading downward into the ink, 20 of them will reach the paper. Of these 20, 80% of them (I think that would be 16) will get absorbed on the way up. That leaves just 4 red photons, out of the original 100, that make it back out. For those of you who are all into the density thing, this would mean about 1.40D. If you understood that, then you know Beer's law, and can apply it to ink on paper. Who said that ink and beer don't mix?

--------------------------

[1] You may have seen this blog post in Flexo Global Magazine. So, we are talking popular here.

[2] I am pleased to say that this blog post was picked up just recently by the Australia New Zealand Flexographic Technical Association magazine (August 2013). Look for it in your mailbox. This blog post was also picked up by Flexo Global magazine. So... we are talking really popular here.

[3] Truth in advertising here... this is not an actual photo, but a digital simulation. It is very nearly photorealistic due to my vast artistic ability, and it simulates what really happens, but I repeat, this is not an actual photo.

[4] Someday I will get around to writing a blog about how light comes in three flavors: red, green, and blue. That's all. All other colors are a combination of those three. Based on this simplification, you can explain why inks are CMY and computer monitors are RGB. It will be a totally cool blog. I will send you a text when I finally publish it.

Wednesday, August 21, 2013

What color is human skin?

When I grew up, there was only one answer to the question "What color is human skin?" At the time I was born, Binney and Smith [1] had one crayon (called "flesh") for skin tone. Then in 1962, the company renamed the "flesh" crayon "peach" [2]. Then there was the Civil Rights Act of 1964. Finally, in 1992, Binney and Smith introduced their Multicultural Crayon set. This set includes six crayons that represent an actual range of skin tones: apricot, burnt sienna, mahogany, peach, sepia, and tan. White and black are included for mixing. Or maybe they were added because of the "eight crayons in a box" box rule. Either way, no one really has pure white or pure black skin.

Photo egregiously stolen from Dick Blick website

The Humanae Project

I was excited this week when I found out about the Humanae Project. A lady from Brazil by the name of Angelica Dass has taken on the task of collecting images of a zillion people [3] to catch their skin tones. A small area of the image of each face was averaged and Photoshopped in as the background.

Six of the hundreds of people who have volunteered for the project

How could I resist doing a little math on these images and blogging about it?

Visual look

At the time I looked, there were 420 images. I pulled the RGB values from the background of each of these images and went at it. The first look is a montage of all the colors. My impression? Each little box looks like a skin tone, but it does seem a bit light on the dark - that is - the darker skin tones seem under-represented.

A collection of 420 real skin tones

Here is another look at this data. In the graph below, each dot stands for one person's skin tone. The horizontal position of the dot gives the red value of the color, and the vertical position gives the green value. One thing that the graph shows is that there is a very definite set of colors that qualify as skin tones [4].

Red and green values of all skin tones in the collection

When I look at these graphs, I also see that, for the most part, the collection of skin tone colors form a nice line. Well, a nice crooked line. And maybe the line is kinda fat. Still, this says to me that you could make a pretty decent approximation of all the skin tones by assigning each of the skin tones to a single spot on that crooked, kinda fat line. To put this a different way, if you were to collect the 420 people in a room and ask them to line up in order from darkest skin to lightest skin, they would make a more or less smooth transition

I have done just that in the image below. Each person has a narrow vertical strip, and all the strips are arranged in order according to the average of the red, green, and blue values.

Another collection of 420 skin tones
"Everyone line up in order from darkest to lightest!"

If you stand back and defocus your eyes a bit, this looks smooth. This shows that skin tones form a line. But, if you look closely, many of the individual strips can be seen, contrasting with their neighbors as being perhaps redder or perhaps greener. This shows that the line is kinda fat. People with the same average lightness of skin vary a little bit on the hue of their skin.

This is consistent with a paper that is a lot more rigorous than this blog post [5]. Their conclusion is that one number is enough to characterize around 95% of the information in the spectra of skin. Well, I didn't need none of their sophisticated "principle component analysis" to show the same thing, did I? Show offs.

Motivation for the rest of this blog

I am about to dive into the deep end of the mathogeek pool. Before I do, let me provide some idea of the various applications--to explain why it might be worthwhile to swim in those slide-rule-infested waters.

Application #1 - Suppose someone wanted to generate skin tones of various ethnicity, maybe for a game or to create random avatars. Having a simple parametric equation that describes a wide range of plausible skin tones would be a great way to do that. The word "parametric" in the previous sentence means that you could randomly select a parameter (call it "k") that would be in some range of values (I dunno, maybe from -3 to +3), and the equation would give you an RGB triplet that would be a plausible skin tone.

Application #2 - Suppose someone wanted to find faces in an image. One of several necessary criteria for a pixel to be a face pixel is that the RGB values must belong to the club of plausible skin tones. So, it would be cool to have some sort of equation that describes the set of plausible skin tones, so that any given pixel can be tested for membership in that club.

Application #3 - Suppose someone wanted to characterize someone's skin tone. This characterization might be used to recommend makeup, or to categorize someone as having "winter" coloring. If there were a magic equation with a parameter it would be possible to find the particular value of that parameter which best characterizes the skin for that person. That parameter would be the characterization of that person's skin tone.

These applications all have to do directly with skin tones. I have a few other applications in mind that would use the technique that I describe.

Application #4 - Suppose some math geek (or more likely, a stats geek) needs a way to describe a set of multi-dimensional data--much like one would use mean and standard deviation to describe single-dimensional data. I describe a method below to determine the ellipse (or hyper-ellipse) that describes multi-dimensional data in a statistical way. Since I am not aware of a word for an ellipse or hyper-ellipse that serves as a proxy for a large amount of data, I will invent the word: proxellipse. The proxellipse is an extension of standard deviation to multi-dimensions.

Application #5 - Suppose some other geek (maybe a scientist of some kind) wanted to display a scatter plot of two-dimensional data. If the scientist had twenty points in that plot, it would give some appearance of the amount of spread. But, if the scientist had two hundred points, the scatter plot would give the impression that the spread of data is far broader. The analysis below is an alternative. By displaying a scatter plot along with the proxellipse, one won't be misled by the crowd effect. The proxellipse would show the boundary in which a certain percent of the data will likely fall (with all the normal assumptions about whether the data is normally distributed).

Statistical look

The graph below is the same data as the red/green plot above, only this time, I am looking at red versus blue values. This perspective shows a bit more crookedness. It looks like there are two separate populations: those with darker skin, and those with lighter skin. It looks like two different lines are needed to describe the different sets.

Red and blue values of same data, with a discontinuity highlighted

So, there is a crooked line in RGB space that defines skin tones. For the sake of simplicity, I am going to start by looking at the statistics of just the brighter dots (where R > 180, G > 120, and B > 110). This reduced the set from 420 data points to 404. To be clear, I am quite literally discriminating here on the basis of skin color, excluding the darker skin tones. I apologize, but the darker tones belong to a different statistical distribution.

Now it is my turn to show off some really golly-whiz bang math. This is a green/blue view of the segregated data points, with a red line showing three axes of an ellipse. This ellipse is the proxellipse (with coverage of 3), which is an ellipse that is basically in the same shape, size, and orientation of the data.

Green and blue values of skin tones, with the axes of an ellipse shown in red

Now I'll say a bit more about the proxellipse. It's a statistical thing. Essentially, I have extended the idea of standard deviation to three dimensions. The size of this proxellipse is three standard deviation units in each of the directions. Now, I assume that I am not the first to discover the technique, but it would appear that this technique is not well known in the color science community [6]. Or maybe it's just a dumb idea?

Getting back to the kinda fatness of the line. The degree of kinda fatness can now be quantified. To demonstrate that the data points here are very close to being a line, the length of the major axis is 28.8 gray values. The other two minor axes are 5.3 and 3.3 gray values. For those who don't recall, 28.8 is much bigger than 5.3 and 3.3.

An algebraic look

Let's just pretend that someone wanted to come up with an equation that could be used to generate a sequence of reasonable skin tones. I called that Application #1. One approach would be to apply linear regression to any of the graphs above. One might, for example, determine a best fit function for G as a function of R, and another regression would determine the best fit of B as a function of R. In this way, the value of R is a parameter that can be used to determine the other two color coordinates.

This may work, but I have some reservations about this technique. First, such a regression treats R as an independent variable and G as a dependent one. Really, the two should be in the same category. It seems like something must go wrong. (OK, that's just a philosophical argument.)

Second, the choice of expressing G as a function of R, for example, is a bit problematic due to the relatively steep slope. A small change in R will cause a large change in G. And (the important part) a small change in the random data will cause a large change in the slope that is calculated through regression. That's a bummer.

Third, I have a little known fact about statistics. Garden variety linear regression starts with the assumption that there is no noise in the independent variable. That is to say, it is based on the assumption that we know the R values perfectly, and any deviation from a straight line is strictly because of random variation in G. Now here is my little known fact: If you add random noise to both of the variables in a linear regression, the slope of the regression line will move toward zero. The noise in the independent variable adds a bias.

So... how about another approach? Using my proxellipse analysis, I arrived at the following equations for the RGB values of the skin tones:

R = 224.3 + 9.6 k

G = 193.1 + 17.0 k

B = 177.6 + 21.0 k

When k is allowed to go from -3 to 3, this will provide RGB triplets for reasonable lighter skin tones. The following equations will give reasonable RGB triplets for darker skin tones.

R = 168.8 + 38.5 k

G = 122.5 + 32.1 k

B = 96.7 + 26.3 k

I apologize again... I have created separate but equal equations. <sigh>

Caveats

Science?
As a scientific experiment, this is perhaps not controlled enough to be accepted into a peer reviewed journal. Now, I have every reason to believe that Angela has controlled conditions as best as she can. But, she is using a camera, and cameras are not color measurement devices. As far as I know, she has not provided an image with a Munsell color checker card that would serve to calibrate the colors to real units. There isn't a statement about the type of camera or the settings. In particular, there is no statement of the gamma setting of the camera. (Gamma is a setting that will increase the brightness of midtones in order to give the picture a better appearance. Most cameras have this enabled by default, and it gets in the way of accurate color measurement.)

But these are not my big bugaboo. RGB cameras (at least almost every one on the market today) do not see color the same way we do. The red, green, and blue filters in a camera do not give the camera the same spectral response of the three cones in the eye. I investigated this in my paper Why do Color Transforms Work?

Also, the RGB response differs from camera to camera, so as they say, results may vary.

Ideally, the purist in me would like to see spectral data on everyone's skin, but the purist in me is too darn mired in the details to ever get a blog out. And besides, the images on her website do all look like reasonable skin tone on both of the computer monitors that I routinely use.

Skin blemishes and goniophotometry and umbrophotometry

This analysis is rather simplistic in that it associates one single RGB value with the color of a person's skin. That's just plain silly, for two reasons. First, skin is not uniform in color, especially as one gets older. Second, the color of the skin (and just about anything for that matter) depends on the angles that it is illuminated from and the angle from which it is viewed. If the lighting angle, surface orientation, and the camera are all at the right angle, you can see a very white specular reflection on a surface. This effect is called goniophotometry, the measurement of light as a function of angle.

The third effect is that there are generally shadows on a person's face. Even when illuminated diffusely, there still may be (for example) an area under the nose [7] that is darker because of the shadow. Clearly this effect is accentuated in some people. I have just this moment coined the word "umbrophotometry" to characterize the measure of this effect. "Umbra" means shadow, and is the root of the word "umbrella".

--------------------------------------------
[1] There used to be a company named Binney and Smith. In 2007, the company name was changed to Crayola. Clearly if I talk about stuff they have done after 2007, I should refer to them as Crayola, but how do I refer to the company back before it changed its name? Is it Binney and Smith, or Crayola? And when I want to talk about the guy who won the 1985 Grammy for the Album of the Year, do I refer to him as Prince, the Artist Formerly Known as Prince, or the Artists Formerly Known as the Artist Formerly Known as Prince? [8]

[2] Personally, I don't think that "peach" is quite the right name for that crayon. Maybe I'm wrong, but I think that peaches are more of an orange color than the crayon. I don't have a better suggestion, mind you. I just like to kvetch.

[3] I could have said that she is taking pictures of a brazillion people. Give me credit for not going for that pun!

[4] I am being a bit loose here, since I have only shown the view of R and G. The other views are pretty similar, though.

[5] Sun and Fairchild, Statistical Characterization of Face Spectral Reflectances and Its Application to Human Portraiture Spectral Estimation, Society for Imaging Science and Technology, Volume 46, 2002
http://www.cis.rit.edu/fairchild/PDFs/PAP14.pdf

[6] ASTM 2214 (2002), Standard Practice for Specifying and Verifying the Performance of Color-Measuring Instruments (Section 6.1.1 anticipates my technique. They are discussing a way to evaluate the variability of a collection of color measurements and state:

"Since color is a multidimensional property of a material, repeatability should be reported in terms of the multidimensional standard deviations, derived from the square root of the absolute value of the variance–covariance matrix."

Egg-headed stuff, indeed. If only they knew my method when they wrote that!

[7] George Carlin informed a generation of people that this part of the body is called the philtrum.

[8] This was a trick question. The correct answer is "Lionel Richie". Prince's album Purple Rain was nominated in 1985, but Lionel Richie's Can't Slow Down won the Grammy.

Wednesday, August 7, 2013

Lousy weather every weekend

It was a Monday after a rainy weekend. Naturally, since I had to go to work, the weather was excellent. I was particularly downcast since the weather had put a damper on the previous weekend as well. I mentioned it at work, and got a surprising response.

Actual photo of me enjoying the lovely weekend weather [1]

I should tell you that this was quite a long time ago, back before Al Gore invented the internet. I was working at the University of Wisconsin Space Science and Engineering Center. You would think that with name like that, we would all be working on something dull like a space telescope, a Mars rover, or a space shuttle porta-potty. But it was far more exciting. We were crunching data from weather satellites.

Getting back to my story about a forlorn math guy complaining about the weather, when I made my comment about two weekends lost to lousy weather, I just happened to be surrounded by meteorologists. I was told that everybody knew that weather was cyclical, and it tends to follow a weekly pattern. If you know the weather on one particular Saturday, then you have a darn good guess at the weather the following Saturday.

I have used this factoid frequently. It always sounds impressive when I tell people what next weekend's weather is going to be like. I should say, it sounds impressive when they believe me. Whether my prediction comes to is actually irrelevant. No one remembers my predictions.

Is the "weekly weather pattern" factoid for real? I put it to the test.

The data

It isn't hard to find information on the current weather conditions. But yesterday's weather? That's a bit harder to find. And historical data? I found a place to dig up old weather. [2]

I downloaded four year's worth of weather data from Milwaukee; from July 22, 2009 to July 22, 2013. Why four years? Well... there is a story there. Once upon a time, there was a computer magazine called Byte. It was the magazine for computer geeks. They published an article with a BASIC program for doing a fast Fourier transform. I dug into the code and found it wasn't remotely the FFT (fast Fourier transform) algorithm that Cooley and Tukey made famous. Later, they had an article that looked at sales data that spanned many years of time. They said that a "four year analysis" was used to look for trends. I dropped my subscription. [3]

Mean temperature, Fourier anaylsis

Let's look first at the temperature (see plot below). It is apparent from the sine wave kinda thingie in the plot that I have collected four years' of data, and that the start of the data is kinda mid-summer.

Average daily temperature in Milwaukee over the past four years

That graph is and of itself is pretty cool, but playing with the data? I don't know about other math guys, but the first thing I like to do when I get a new fresh pile of data is to start taking Fourier transforms. It took a bit of massaging of the raw FFT output, but here is a graph showing a section of the frequency data. This graph shows the strength of the periodicity going from a four day period (on the left) to a 10 day period (on the right). There is no discernible peak or bump near the once-per-week mark.

FFT of the mean daily temperature

Correlation

Here is another look at that same data. In this case, instead of doing the whole Fourier bit, I did some correlations. I check to see if the daily temperature on a given day correlated with the daily temperature on the previous day. So... an array of four years of data, 1462 data points. I computed the correlation coefficient between two sub arrays, one going from day 1 to day 1461, and the other going from day 2 to day 1462. Then I computed the same thing, only with a lag of two days, and three days, and so on.

The graph below shows the results. At the very, very left, the correlation between each day's temperature and that same day's temperature is 1.0. Well, duh. The temperature on a given day looks a heckuva lot like the temperature on that same day. The plot below shows that today's temperature also looks a lot like yesterday's, with a correlation coefficient of close to 0.95. In the middle of the chart we see the correlation of today with one week ago. According to the hoity-toity meteorologists, I should see a big spike there, but no. Using the temperature a week ago to predict today's temperature is good (r = 0.85), but it is no better or worse than using the temperature six days or eight days ago.

Correlation coefficient of temperature as a function of spacing (in days)

So, I would say this myth is pretty well busted. That's the way I like my myths: pretty and well-busted. Temperature does not follow a hebdomadal pattern. That's a pity. It would have explained why a week isn't six days, or (shudder the thought!) eight days.

But wait! What about rain?!?!?

I missed something here, didn't I? This blog post started out talking about rainy, not temperature. Maybe I should have a look see at the rain data? Luckily, this database contains precipitation data, as well as wind speed and direction, barometric pressure, humidity, ... but all I want is the rain data.

I present below another correlation chart, this one showing the correlation of precipitation. There it is, as plain as the nose on Owen Wilson's face, a spike at lucky seven. There is a correlation (r = 0.1). This is roughly in the range of 99.95% significance range. So. Maybe my meteorologist friends weren't all that dumb after all?

Correlation coefficient of precipitation as a function of spacing (in days)

I should make a comment here about how big the number 0.1 is when it comes to correlation coefficients. Let's just say that Norm decided to make some money on these results. Every night, for four years, he would sit at a bar in Milwaukee, and take bets on whether it would rain the following week. I know, ideal job, right? If Norm follows the advice in this blog, betting that the rain today can predict that of one week from today, I guarantee that he would come out ahead at the end of the four years.

Norm, pondering this money-making scheme

On the other hand, the number 0.1 is very tiny. The predictive strength when r = 0.1 is on the order of 0.005. That's an indication of how much the odds are in Norm's favor. He is pretty sure of making money, but he needs to make a lot of bets to get there.

This may sound like a bad business model, but this is how casinos work. They want to run games where the table is tilted ever so slightly in their favor. Too much of a tilt (too low a chance of the patron winning), and people will walk away. Too little of a tilt, and the casino won't make a buck.

Another comment on the plot above. There is also a peak out at 29 and 31 days. Hmmmm... Maybe this is the effect of the moon?

---------------------------------

[1] If truth be told, although I have sung before, and have been in the rain, but this is actually a picture of Gene Kelly, not me. If more truth be told, Jean Kelly is my sister. Here is my favorite painting of hers.

[2] If truth be told, I didn't find this website on my own. Nate Silver told me about it. If even more truth be told, he didn't actually tell me in person. I read it in one of his books: The Signal and the Noise: Why So Many Predictions Fail — but Some Don't. He and I are good buddies. At least we would be if we ever met. That's my prediction.

[3] I had a house that I was getting ready to sell. The realtor told me that the entry way needed work - first impressions and all that. Updating this would give the whole house a new image. So, I gave it a fresh coat of paint, updated the light fixture, and polished up the handle of the big front door. I guess you could say that I performed image enhancement through a fast foyer transform. Some of you will find this joke incredibly funny.

Wednesday, July 31, 2013

The color of a bunch of dots, Part 4

If you have been wondering for years about TVI bananagrams, you have come to the right blog post. This blog post is the definitive blog post on TVI bananagrams. But if (for some crazy reason) you have not been wondering all your life about TVI bananagrams, then this could still be a watershed moment in your understanding of dot gain and its fuzzy sister, tone value increase.

Cyana bananagram

Dot squish

Pat Noffke and John Seymour presented a paper at TAGA in 2012 entitled A Universal Model for Halftone Reflectance. [1] In the paper, they developed an equation for what they called Dot Area Increase. This is a measure of how much a dot squishes out when it hits the paper. This is similar to what Murray and Davies called dot gain, but there is one novel difference. In both models, dots get bigger when they hit the paper, but in the Noffke-Seymour model, they also get thinner because of the whole law of conservation of ink thing. Because the dots are thinner, the color of the dot is less rich than the solid.

A halftone dot, before and after the steamroller

If there isn't any dot squish for a particular hypothetical printing, then a 30% dot would cover 30% of the area, and the thickness of the dot would be same same as the thickness of the solid ink. This hard dot has no dot area increase. The equation for reflectance in this case is the Murray-Davies equation.

Murray-Davies equation for hard dots

At the other extreme, the dots squish out completely so there is no longer any semblance of the dot structure (think gravure). Perfectly soft dots. Contone - continuous tone. At this extreme, one can use Beer's law to estimate the reflectance of a halftone. You will no doubt recall the equation from my blog on Beer's law,

Beer's law equation for continuous tone

I should add here, that the two equations here should be applied on a wavelength by wavelength basis.

I should also 'splain a bit about the "Ain" that is being used as an exponent. Normally in Beer's law, this is where you would put something to do with ink film thickness. So, why did I stick the dot area in that spot? Imagine that you start out with perfect, hard halftone dots that are, I dunno, 30% dot area. Since they are perfect, let's just assume that the ink thickness of the dots is the same as the thickness of the solid.

Now, let's say that these dots get stepped on by an elephant. They are squished out so as to cover the whole area uniformly. How thick is the ink now? If the original dot area is 30%, then the new ink thickness is 30% of that of the solid. So, the exponent of Ain represents the thickness of the fully squished out dots.

Poor defenseless halftone dot, about to become a continuous tone

The dot squish equation

The pure genius of the Noffke-Seymour paper is that they considered what happens in between. In the figure below, the left side shows the starting condition. The halftone dot covers 25% of the area, and has the full thickness of the solid. The right side shows what happens after squishing [3]. The dot now covers 39% of the area, and as a result, is thinner by whatever ratio is necessary to preserve ink volume. I dunno? Maybe the ratio is 25% / 39%? I guess that's about 0.64. [4]

A tale of two halftone dots

Using the halftone dot at the right to illustrate the Noffke-Seymour formula, Beer's law is used to estimate the reflectance of the light blue area covered by ink. A thickness of 0.64 (as compared with the thickness of the solid) is used. Since a wandering photon has a 39% chance of hitting the area covered by ink, this reflectance is multiplied by 39%. This accounts for the light that reflects from the ink. The remaining photons will hit the paper, so (in true Murray-Davies fashion) the reflectance of the paper is multiplied by 61%, and this is added to the first number.

The equation below tells the whole story. "Ain" is the dot area going in. In the example above, this would be 25%.. "Aout" is the final area of coverage after squishing, 39% in the example.

The beautiful Noffke-Seymour equation

Note that at the extreme of no squish, Ain = Aout, the equation simplifies to Murray-Davies. At the other extreme, then Aout = 1, and this equation simplifies to Beer's law.

In the first big finding, the authors of this paper looked at spectra from a big pile of tone curves, and came to the conclusion that pretty much every printing modality (web offset, stochastic web offset, gravure, newspaper, flexo, and ink jet [5]) all fit conveniently between these two extremes. This is huge. (But that's just my opinion.) Tone value increase for any type of printing can be described in terms of how broadly the dots squish. That's all you need to know.

Finally, the bananagram

I have saved the best for last. The other huge finding is shown below, the invention of the bananagram [6]. The bananagram below is an a*b* plot of all possible tone curves for a given cyan ink [7]. The left edge of the banana is the tone curves generated by assuming that the halftone dots are perfectly hard. The right hand side is a similar curve made with the assumption that the dots are perfectly flattened out.

Cyana bananagram

Now, lemme tell you about the rainbow colored lines. The yellow line, as an example, is all possible a*b* values that a 40% cyan halftone dot could take. Starting with a perfectly hard 40% cyan halftone, as you gradually squish it out, you will see it trace out the curve from one side of the banana to the other.

Wow. The position along that line tells you how hard the dots are. If you know the dot hardness (along with the spectra of the solid and the paper, and the original tone value), you can figger out the color of the halftone.

Foreshadowing the next blog post

I need to eventually tie up at least one loose end. I have been throwing dot gain kind of equations around willy-nilly, or perhaps yuley-niely. We have (so far) the following three equations to explain the color of a halftone: Murray-Davies, Yule-Nielsen [2], and Noffke-Seymour.

Murray-Davies (we all know) is a lump of over-cooked turnips when it comes to accurately predicting or measuring color. Yule-Nielsen, seems to be all the rage. Then these young upstarts come along with yet another formula that is gonna save the world! How can this all be reconciled?

Stay tuned for the thrilling conclusion!

----------------------------------

[1] I know these guys. One of them seems to be around whenever I stop at the bar for a beer. I guess he must hang out there a lot. Anyway, these two guys do bunches of seriously good stuff. And they're modest, too. Well, at least one of them is.

[2] If you have been paying careful attention, you will notice that I have reverted back to the more common spelling the the latter gentleman's name. In a previous blog, I cited the original paper from the TAGA Proceedings, along with a scan of the heading for this paper. In the original published paper, the name is spelled "Neilsen", which is contrary to virtually every citation of the name. I ranted and raved about how 11,400 patent citations use the misspelling "Nielsen". As such, I would imagine that these 11,400 patents are potentially invalid. Gary Field did some excellent detective work, and has convinced me that the TAGA paper is a transcription error. The gentleman's name is Waldo J. Nielsen. Assignees of these patents can breathe easier.

[3] I keep talking about squishing, but this might not always be the case. In a web offset press, where there is a lot of pressure between the plate and the blanket and the paper, then squishing is probably a valid term. But in the case of gravure, where the ink has a very low viscosity, maybe it's not so much squishing as it is just spreading out. In newspaper, where the paper does not have a coating, maybe the significant effect has more to do with the ink being wicked into the paper. All of these I have put under the umbrella term "squishing." Whatever you squish under your umbrella is your own business.

[4] How should I know what the ratio is? Am I called John the Arithmetic Guy??!?!?

[5] No data was harmed in the filming of this experiment.

[6] I expect to see bananagram T-Shirts available on the internet. There will be bananagram support groups for people who have family members sucked into the cult. I expect this will be a topic in the next state of the union address, with plenty of polarized commentary on Fox News and MSNBC.

[7] In three dimensions, this is a surface, sort of like a fly's wing or a sail. In other words, the possible range of colors of a halftone of a given ink can be described by a three dimensional figure that looks like a fly's wing.