Wednesday, November 21, 2012

The Full Monty

My good buddy Steve from the UK gave me the idea for this week's blog. He suggested that I write about  the "Monty Hall" problem. My first reaction was that this was yet another dumb British idea, like Twiggy, Monty Python, and double-decker buses. What could I possibly have to say that hasn't already been said by a whole bunch of people who are either really smart, or who claim to be really smart?

We have a lot to thank the Brits for

One person who fits into one of those categories is Marilyn vos Savant [1]. She treated the Monte Hall problem in her Parade magazine columns many years ago. As I recall, she caught a lot of flak from a lot of really smart people for giving the wrong answer. So, I know that whatever I say, I will get flak from someone who claims to be smart. 

Monroe or vos Savant? Ginger or Mary Ann?

Then I stopped to think. I was a teenage nerd once, so I kinda like Monty Python. Maybe the Brits do have some good ideas once in a while. Maybe it wasn't such a bad idea for us Americans to send settlers to colonize England a few centuries ago. And maybe it's time for a blog post on the Monte Hall problem. 

The Monte Hall problem

Years ago (oh no, not another "back when I was a kid" lecture) there was a show called Let's Make a Deal, hosted by Monte Hall. For reasons that I was never able to understand, people who showed up in the audience dressed like they were going to a showing of Rocky Horror. Only there weren't quite so many drag queens. Some of the folks with the most outlandish outfits became contestants.And at the end of every show, there was one contestant who got to chose from three doors "behind where Carol Merrill is standing". [2]  

Vanna White's role model

There was always a fabulous prize behind one door, like a boat the size of Lake Michigan or new kitchen filled with a lifetime supply of Spam and some baked beans. The other doors? Goats. Maybe it wasn't always a goat, but that's what I remember. I remember thinking it might be kinda cool to have a goat. Although the lifetime supply of Spam sounds pretty alluring.

What? You don't want me?!?!?

Now we're coming to the fun part. No matter which door the contestant picked, Monty would open a different door, showing a goat, and ask if they wanted to change their choice. What's the best strategy for the contestant? Change or hang on? One line of reasoning says that there are two doors, one with something good, the other with something bad. Fifty-fifty. Why bother changing? Monty is just trying to trick you into changing to the goat.

The other line of reasoning is long and arduous and you have to think and it's hard and ... well, it says that you're chances are better if you change doors. Like... for some reason your probability of picking the fabulous prize has changed from 1 in 3? Or something. Really?

Wikipedia, the world's largest source of misinformation, has an entry on this famous problem. They say that it makes sense to change. They explain this non-intuitive answer in several non-intuitive ways. I am going to add yet another explanation, but first I am going to appear to change the subject.

Monte Carlo methods

Monte Carlo. It brings to mind "shaken, not stirred" and casinos and baccarat. And numerical methods, of course. Well ,maybe not for most people, but certainly for me.

Did you say Monte Carlo methods?

In college, I took a class in Monte Carlo methods. Basically, this is a way of solving numerical problems using random numbers. The first use of this method was back in the days of the Manhattan Project. The guys [3] were working on shielding from neutrons. They could easily explain the path of any particular neutron, but they had trouble finding the equation that would explain what happens to the neutrons in the aggregate as a function of the material and its thickness. They eventually solved the problem by using random numbers to simulate the path of thousands or millions of neutrons - start each simulated neutron out from a random position, moving in a random direction and see where it goes.

The professor for my class, John Halton [4], started us out with a bit of a refresher on probability. He asked us to compute the odds of all the basic poker hands: three of a kind, flush, straight, etc. This is a bit of a tricky problem, not impossible, but it's easy for a kitten like me to get balled up in the combinatorics.

I knew that I was likely to make some sort of mistake, so I decided to double check my answers by doing a little computer simulation. I wrote the code to deal hands at random, and then decide what the hand was. I let my computer play poker overnight, and then checked in the morning to get the results. Sure enough, my simulation revealed that one of my calculations was off by a factor of two. I found my error, and I turned in both the combinatoric solution and the program for my simulation.

This was to be (perhaps) the shining moment of my otherwise drab and wretched college years, since I was never elected class president or homecoming queen. Dr. Halton was tickled that I used a Monte Carlo method for an assignment in a Monte Carlo methods class, and he took part of a lecture to explain how I had double checked my homework.

Actual photo of me basking in the glory of my well-deserved accolades
I was so handsome in those days

I have used this same approach numerous times. Sure, I can derive equations and write them out and solve them. But I have also learned that, despite meticulous double checking, I can still have bone-headed mistakes interspersed with my usual absolutely brilliant analysis. When a problem lends itself to Monte Carlo simulation, I will often use the simulation to double check my answer. 

Finally getting to the point

For those readers who may have forgotten what this Full Monty blog post is about, I have been making Monty Python references to lead up to a discussion about using a Monte Carlo method to solve the Monty Hall problem.

This problem is ideal for Monte Carlo double checking. It is complicated enough that it is easy to talk yourself into some wrong assumptions, but at the heart, it is quite simple to simulate.

1. Randomly decide which of the three doors will conceal the fabulous prize.
2. Randomly select a door for the contestant.
3. Decide which door Monty will show.
4. Switch the contestant to the remaining door - not the one chosen, and not the one opened.
5. Record results.
6. Repeat a zillion times.

It's hard to get balled up in this simulation. Or then again... Some clever mathematical chap might try to get all clever on the algorithm that I have described. For example, someone could "clever this algorithm up" when it comes to step #4. It's a bit hard to put that step into code. But here is something clever. Let's say that the door picked is 2, and the door opened is 3. Subtract those from 6, and what do you get? 6 - 2 - 3 = 1. This is the remaining door. This always works. Clever, eh? I'm rather proud of it.

This cute image has little to do with the rest of this blog

A clever mathematical chap might be tempted to use this trick on #3 as well. Unfortunately this would be a bug. It works when the contestant picks a door with a goat behind it, but not when he picks a door with the yaught.

If you feel tempted to simplify the problem in this way, then please slap your face. The idea of trying to apply clever mathematical analysis completely defeats the purpose of using Monte Carlo to check your work. The whole point to using this technique is to avoid falling into the trap of outclevering yourself.

Results

Here is the test. I have assumed (just for the sake of simplifying this discussion) that the probability of winning a yacht is one in three if the contestant does not switch. I wrote some code to simulate what would happen if the contestant switched every time.

The program that I wrote is listed at the end of this blog post. I make no claims that this is the most efficient way to write the code. In fact, I intentionally focused on making the code as straightforward as possible. Oh... also... I do most of my programming in Mathematica. I'm not going to apologize for that. I realize there are probably some Matlab users reading this. Well... I have been using Mathematica since 1985. So there.

I ran the program once, with 100 iterations, and got 66 yacht and 34 goats. This sounds convincing. The number of yachts is much closer to 2/3 than 1/3. But, this is statistics, so I could easily get thrown off. I could solve that by upping the number of iterations to one thousand or one million. This should give me a better estimate of the odds, but I really don't know much about the dependability of my estimate.

I have a cleverer trick. How about I run this program with 100 iterations ten times. The experiment will have been run with a total of 1,000 contestants, but I would have them partitioned off in groups of 100. This gives me an idea of the range of the estimates. 

Here are the number of yachts that were won in each batch of 100: 67, 71, 60, 66, 65, 61, 68, 61, 61, and 66. The mean is 64.70 and the standard deviation is 3.59. I am going to take a big leap now, and propose that this distribution is roughly normal [5]. If that is the case, then the number of yachts won per hundred has a 95% chance of being between 57.52 and 71.88. (I have gone two standard deviation units to either side.) I have no idea where I am going to park all those yachts.

Now I need to be careful in how I say this next bit. The range (57.52 to 71.88) is the range for the number of goats per hundred. It is not the range for my overall estimate of the mean. The mean was based on a total of 1,000 trials, so clearly the range must have gotten smaller. 

The rule is that when you run n trials, the standard deviation goes down by a factor of square root of n. I ran ten trials, so the range has been reduced by a factor of just over 3. I know then that the average number of yachts won per hundred trials is then  62.43 to 66.97, with a 95% confidence. 

Based on that, I can safely exclude the possibility that there is no advantage to switching. Also, I can't exclude the possibility that the probability of winning when you switch is 2 in 3. 

Another application

This is a technique that I use frequently to double check my algebra. I plan on using it shortly when I do some further investigation on the distribution of ΔE values. I am wondering about the theoretical distribution. Some have called it a chi-squared function. I think that the distribution of  ΔE squared might be chi-squared, but I can test this. And I should, because I know I often get myself balled up.

--------------------------
[1] What kind of a name is that anyway? Who calls themselves "savant"? I mean really, I would never call myself John the Savant Guy!
[2] Carol Merrill was born in Wisconsin. John the Math Guy was born in Wisconsin. Draw your own conclusions.

[3] You know, the same old guys. Stanislaw Ulam, Nicholas Metropolis, Enrico Fermi, and my idol, John von Neumann. I have all their baseball cards.

[4] John Halton was from England, by the way. Oxford. Cambridge. All that rot. I do not know whether he knew John Cleese or Eric Idle. He did come into class one day, telling us how just a sprinkle of water mixed in with the beaten eggs made the fluffiest of omelets, though.

[5] This is another important discussion - whether you can assume that a distribution is normal - but I will leave that for another blog post.

The program (in Mathematica)
yaughts = 0;
goats = 0;
Do [
  (* pick door for yaught and contestant pick, randomly and independently *)
  yaughtdoor = Random  [Integer, {1, 3}];
  pickdoor = Random  [Integer, {1, 3}];
  
  (* assemble list of doors that Monty can open *)
  freetoopen = {};
  Do [
    If [
      (door ≠ pickdoor) && (door ≠  yaughtdoor),
      AppendTo [freetoopen, door]
      ],
    {door, 1, 3}];
  
  (* let Monty pick from the door(s) available *)
  If [Length [freetoopen] == 1,
    opendoor = freetoopen [[1]],
    opendoor = freetoopen [[Random [Integer, {1, 2}]]]
    ];
  
  (* determine the remaining door that contestant can pick *)
  Do [
    If [
      (door ≠ pickdoor) && (door ≠  opendoor),
      newpick = door
      ],
    {door, 1, 3}];
  
  (* evaluate results *)
  If [
    newpick == yaughtdoor,
    yaughts++,
    goats++
    ],
  {100}]
Print ["Yaughts and goats: ", {yaughts, goats}]

4 comments:

  1. the explanation for why you should choose the switch door strategy is really not hard to explain. If your strategy is to not switch, the only way to win the prize is to pick that door outright, which is 1/3. If your strategy is to switch, the way to win the prize is to first pick one of the doors with a goat, 2/3, then when Monte Hall opens the door with the other goat, you switch and get the prize. done. the switching strategy gives you the prize 2 out of 3 times, the other strategy gives you the prize only 1 out of 3 times.

    ReplyDelete
  2. These simulations have been available for school children through whatever age from the Shodor organization in Durham, NC

    http://www.shodor.org/interactivate/activities/SimpleMontyHall/

    http://www.shodor.org/interactivate/activities/AdvancedMontyHall/

    and

    http://www.shodor.org/interactivate/activities/GeneralizedMontyHall/

    There are also lesson plans and discussions. Coming soon to mobile platforms!!

    Hope this helps!

    ReplyDelete
  3. Excellent! Thank you Robby. I played 100 games and won 63. I haven't decided what to do with my winnings.

    ReplyDelete
  4. Actually, if I remember correctly, Marilyn caught a lot of flak from a lot of really smart people for getting the answer right.

    ReplyDelete