Wednesday, August 29, 2012

People do not make good statisticians


The lottery and gambling

The United States government has come to the realization that Japan is leading us in mathematical literacy. The government's approach to this, as with cigarettes and alcohol, is to attempt to change our behavior by putting a tax on what they don't like, in this case mathematical illiteracy. They call this tax the lottery.
Paraphrase of comedian Emo Phillips
Every American should learn enough statistics to realize that "One-in-25,000,000" is so close to "ZERO-in-25,000,000" that not buying a lottery ticket gives you almost virtually the same chance of winning as when you do buy one!
Mike Snider in MAD magazine, Super Special December 1995, p.48

I was in college when MacDonald's started their sweepstakes. Finding the correct gamepiece was going to make someone a millionaire. I had a friend named Peter[1] with a hunch. He was going to win.
I was, on the other hand, a math major. I considered my odds of being that one person in the United States who would be made incomprehensibly rich. There were a hundred million people trying to find that one lucky gamepiece. My chances were one in one hundred million of winning a million dollars. In my book, my long-run expectation was of my winning about a penny. Despite the fact that I was a poor student, scrounging to find tuition and rent, the prospect of winning (on average) one cent did not excite me. I was not about to go out of my way to earn this penny.
I was familiar with the Reader's Digest Sweepstakes. I had sat down and calculated the expected winnings in the sweepstakes. I expected to win something less than the price of the postage stamp I would need to invest in order to submit my entry, so I chose not to enter.
Peter was not a math major. Peter knew that if he was to win, he needed to put forth effort to appease the goddess Tyche[2]. Whenever we went out, whenever we passed the golden arches, he took us through the drive-through to pick up a gamepiece. Since future millionaires should not look like tightwads, he would order a little something. He would buy a soda and maybe an order of fries.
I took Peter to task for his silly behavior. I explained to him calmly the fundamentals of probability and expectation. I explained to him excitedly that he was being manipulated, being duped into spending much more money at MacDonald's than he would have normally. He told me that he would laugh when he received his one million dollars.
Did he win? No. In college, I took this as vindication that I was right. This event validated for me a pet theory: people are not good statisticians[3].
Our state (Wisconsin) has instituted Emo Phillips' tax on mathematical illiteracy. By not participating, I am a winner in the lottery. Profits from the lottery go to offset my property taxes. It is with mixed emotion that I receive this rebate each year. Like anyone else, I appreciate saving money. I even take a small amount of smug satisfaction that I win several hundred dollars a year from the lottery, and I have never purchased a lottery ticket. And I have made this money from people like Peter, who do not understand statistics.

One newsclip caught me in my smugness. The report characterized the typical buyer of a lotto ticket as surviving somewhere near the poverty level. I believe that we all have a right to decide where to spend our money. I don't think that the government should only sell lotto tickets to people who can prove that their income is above a certain level. I am, however, troubled by the image of my taxes being subsidized by an old woman who is just barely scratching out a living on a pension.
This image was enough for me to reconsider my mandate that people should base all their decisions on rational enumeration of the possible outcomes, assignation of probabilities, and computation of the expectation. What if I were the pensioner who never had enough money to buy a balanced diet after rent was paid? In the words of the song, "If you ain't got nothin', you got nothin' to lose."  Is the pensioner buying a lottery ticket because he or she is not capable of rationally considering the options? Or are all options "bad", so the remote chance of making things significantly different is worth the risk. Not too long ago, I would have blamed the popularity of the lottery on mathematical illiteracy. Today I am not so sure.

Psychological perspective

Where observation is concerned, chance favors only the prepared mind.
Louis Pasteur
Aristotle maintained that women have fewer teeth than men; although he was twice married, it never occurred to him to verify this statement by examining his wives' mouths.
Bertrand Russel, The Impact of Science on Society
As engineers and scientists, we like to consider ourselves to be unbiased in our observations of the world. Worchel and Cooper (authors of the psychology text I learned from) lend support for our self-evaluation:
[Studies] demonstrate that if people are given the relevant information, they are capable of combining it in a logical way...
If we read on, we are given a different perspective of the ability of the human brain to tabulate statistics:
But will they? ... We know from studies of memory processes and related cognitive phenomena that information is not always processed in a way that gives each bit of information equal access and usefulness.
Worchel and Cooper go on to describe experimental evidence of people not weighing all data equally. Furthermore, we tend to be biased in our judgment of an event when we are involved in that event, our placement of blame in an accident depends on the extent of damages, and we generally weight a person's behavior higher than we weight the particular situation the person is in.

The primacy effect

Several other rules can be invoked to explain our faulty data collection. The first rule to explain what information is retained is the primacy effect. This states that the initial items are more likely to be remembered. This fits well with folklore like, "You never get a second chance to make a first impression," and "It is important to get off on the right foot." Statistically speaking, the primacy effect can be thought of as applying a higher weighting on the first few data points.
In one experiment of the primacy effect, the subject is shown a picture of a person, and is given a list adjectives describing this person. The order of the adjectives is changed for different subjects. After seeing the picture and word list, the subject is asked to describe the person. The subject's description most often agrees with the first few adjectives on the list.

The recency effect

The second rule to explain memory retention is the recency effect. This states that, for example, the last items on a list of words (the most recently seen items) are also more likely than average to be remembered. In other words, the most recent data points are also more heavily weighted than average. As an example of this, I remember what I had for lunch today, but I can barely remember what I had the day before. If my doctor were to ask me what I normally had for lunch, would my statistics be reliable?

The novelty effect


The third rule states that items or events which are very unusual are apt to be remembered. This is the novelty effect. I once had the pleasure to work in a group with a gentleman who stood 6'5". When he was standing with some other team members who were just over six foot, a remark was made that we certainly had a tall team. In going over the members of this team, I recall four men who were 6'2" or taller. But I also remember a dozen who were an average height of 5'8" to 6', and I recall two others who were around 5'4". The novelty of a man who was seven inches above average, and the image of him standing with other tall men, was enough to substitute for good statistics.
I recall one incident where a group of engineers was just beginning to get an instrument close to specified performance. The first time the instrument performed within spec, we joked that this performance is "typical". The second time the instrument performed within spec (with many trials in between), we upgraded the level of performance to "repeatable". The underlying truth of this joking was the tendency for all of us to only remember those occasions of extremely good performance.

The paradigm effect

A fourth rule which stands as a gatekeeper on our memory is the paradigm effect. This states that we tend to form opinions based on initial data, and that these opinions filter further data which we take in. An example of the paradigm effect will be familiar to anyone who has struggled to debug a computer program, only to realize (after reading through the code countless times) the mistake is a simple typographical error. The brain has a paradigm of what the code is supposed to do. Each time the code is read, the brain will filter the data which comes in (that is, filter the source code) according to the paradigm. If the paradigm says that the index variable is initialized at the beginning, or that a specific line does not have a semi-colon at the end, then it is very difficult to "see" anything else.

The paradigm effect is more pervasive than any objective researcher is willing to admit. I have found myself guilty of paradigms in data collection. I start an experiment with an expectation of what to see. If the experiment delivers this, I record the results and carry on with the next experiment. If the experiments fails to deliver what I expect, then I recheck the apparatus, repeat the calibration, double check my steps, etc. I have tacitly assumed that results falling out of my paradigm must be mistakes, and that data which fits my paradigm is correct. As a result, data which challenges my paradigm is less likely to be admitted for serious analysis.
An engineer by the name of Harold[4] had built up some paradigms about the lottery. He showed me that he had recorded the past few month's of lottery numbers in his computer. He showed me that three successive lottery numbers had a pattern. When he noticed this, Harold bought lots of lottery tickets. The pattern unfortunately did not continue into the fourth set of lottery numbers. As Harold explained it to me, "The folks at the lottery noticed the pattern and fixed it."
Harold's paradigm was that there were patterns in the random numbers selected by lottery machines. Harold had two choices when confronted with a pattern which did not continue long enough for him to get rich. He could assume that the pattern was just a coincidence, or he could find an explanation why the pattern changed. In keeping true to his paradigm, Harold chose the latter. When he explained this to me, I realized that it was fruitless to try to argue him out of something he knew to be true. I commented that the folks at the lotto had bigger and faster computers than Harold, just so they could keep ahead of him.
As another example of the paradigm effect, consider an engineer named William[5]. William was a heavy smoker and had his first heart attack in his mid-forties. He was asked once why he kept smoking, when the statistics were so overwhelming that continuing to smoke would kill him. William replied that his heart attack was due to stress. Smoking was his way of dealing with stress. To deprive himself of this stress relief would surely kill him. Furthermore, stopping smoking is stressful in and of itself.
William's paradigm was that he was a smoker. No amount of evidence could convince him that this was a bad idea. Evidently the paradigm is quite strong. In a recent study, roughly half of bypass patients continue to smoke after the surgery. William had six more heart attacks and died after his third stroke.

The primacy effect and the paradigm effect working together

The primacy effect and the paradigm effect often work together to make us all too willing to settle for inadequate data. My own observation is that people often settle for a few data points, and are often surprised to find out how shaky their observation is, statistically speaking.
A case in point is my belief that young boys are more aggressive than young girls. The first young girls I had opportunity to closely observe were my own two daughters, who I would not call aggressive. The first young boy I observed in any detail was the neighbor's, who I would call aggressive.  My conclusion is that young boys are aggressive, and young girls are not.
Note that, if three people are picked at random, it is not terribly unlikely that the first person chosen is aggressive, and the other two are not. In other words, I have no need to appeal to a correlation between gender and aggressiveness to explain the data. The simple explanation of chance would suffice.
The primacy effect says that these three children were the most influential in shaping my initial beliefs. The paradigm effect says that the future data which I "record" will be the data which supports my initial paradigm.
In terms of evolution, one would be tempted to state that an animal with poor statistical abilities would not be as successful as an animal which was capable of more accurate statistical analysis. Surely the hypothetical Homo statistiens would be able to more accurately assess the odds of finding food or avoiding predators.
Consider the hypothetical Homo statistiens first encounter with a saber toothed tiger. Assume that he/she was lucky enough to survive the encounter. On the second encounter, Homo statistiens would reason that not enough statistics were collected to determine whether saber toothed tigers were dangerous. Any good statistician knows better than to draw any conclusions from the first data point. Clearly, there is an evolutionary advantage to Homo sapiens, who jumps to conclusions after the first saber toothed tiger encounter.

In the words of Desmond Morris,
Traumas... show clearly that the human animal is capable of a rather special kind of learning, a kind that is incredibly rapid, difficult to modify, extremely long-lasting and requires no practice to keep perfect.

The effect of peer pressure


When Richard Feynman was investigating the Challenger disaster, he uncovered another fine example of how poor people are at statistics. He was reading reports and asking questions about the reliability of various components of the Challenger, and found some wild discrepancies in the estimated probabilities of failure. In one meeting at NASA, Feynman asked the three engineers and one manager who were present to write down on a piece of paper the probability of the engine failing. They were not to confer, or to let the others see their estimates. The three engineers gave answers in the range of 1 in 200 to 1 in 300. The manager gave an estimate of 1 in 100,000.
This anecdote illustrates the wide gap in judgment which Feynman found between management and engineers. Which estimate is more reasonable? Feynman dug quite deeply into this question. He talked to people with much experience launching unmanned spacecraft. He reviewed reports which analytically assessed the probability of failure based on the probability of failure of each of the subcomponents, and of each of the subcomponents of the subcomponents, and so on. He concludes:
If a reasonable launch schedule is to be maintained, engineering often cannot be done fast enough to keep up with the expectations of the originally conservative certification criteria designed to guarantee a very safe vehicle... The shuttle therefore flies in a relatively unsafe condition, with a chance of failure on the order of a percent.
On the other hand, Feynman is particularly candid about the "official" probability of failure:
If a guy tells me the probability of failure is 1 in 105, I know he's full of crap.
How can it be that the bureaucratic estimate of failure disagrees so sharply with the more reasonable engineer's estimate? Feynman speculates that the reason for this is that these estimates need to be very small in order to ensure continued funding. Would congress be willing to invest billions of dollars on a program with a one in a hundred chance of failure? As a result, much lower probabilities are specified, and calculations are made to justify that this level of safety can be reached.
I am reminded of another experiment which was devised by the psychologist Solomon Asch in 1951. In this experiment, the subject was told that this was an experiment investigating perception. The subject was to sit among four other "subjects", who are actually confederates. The "subjects" were shown a set of lines on a piece of paper (for example) and are asked to state out loud which line was longest. The actors were called on first, one at a time. They were instructed to give obviously incorrect answers in 12 of 18 trials, but they were to all agree on the incorrect answer.
It was found in that 75% of subjects caved into peer pressure, and agreed with the obviously incorrect answers. When asked about their answers later, away from the immediate effects of peer pressure, the subjects held to their original answers, incorrect or not. As far as can be measured with this psychological experiment, the subjects came to believe that a two inch long line was shorter than a one inch long line.
So it is with NASA's reliability data. The data may never have had any shred of credence whatsoever, but simply by repeating "1 in 100,000" often enough, it became truth.
I have included Feynman's example not to put down NASA, or promote the ever-popular game of "manager bashing", but to illustrate this ever-so-human trait that we are all prone to. We believe what others believe, and we believe what we would like to be true.

Summary

The effects mentioned here together support the statement that people do not make good statisticians. The point which is made is not that "mathematically inept people are poor statisticians", or that "people are incapable of performing good statistics". The point is that the natural tendency is for people to not be good at objectively analyzing data. This goes for high school drop-outs as well as engineers, scientists and managers. In order for people to produce good statistics, they need to rely not on their memory and intuition, but on paper and statistical calculations.

Bibliography

Feynman, Richard P., What do you care what other people think?, 1988, Penguin Books Canada Ltd.
Flanagan, Dennis, Flanagan's Version, 1989, Random House
Kresch, David, Crutchfield, Richard S., Livson, Norman, Elements of Psychology, Third Edition, 1974 Alfred Knopf, Inc.
Morris, Desmond, The Human Zoo, 1969, McGraw Hill
Worchel, Stephen, and Cooper, Joel, Understanding Social Psychology, revised edition 1979, Doresy Press



[1] Not his real name.
[2] Tyche was the Greek goddess of luck.
[3] I met one person whose behavior indicated that he was not a good statistician, therefore all people are not good statisticians...[This demonstrates that I am not a good statistician, since I am content with a sample size of . There are therefore, two people who are not good statisticians, and this further proves my point.]
[4] Not his real name, either.
[5] You guessed it. Not his real name.

1 comment: