John the Math Guy: Lousy weather every weekend

It was a Monday after a rainy weekend. Naturally, since I had to go to work, the weather was excellent. I was particularly downcast since the weather had put a damper on the previous weekend as well. I mentioned it at work, and got a surprising response.

Actual photo of me enjoying the lovely weekend weather [1]

I should tell you that this was quite a long time ago, back before Al Gore invented the internet. I was working at the University of Wisconsin Space Science and Engineering Center. You would think that with name like that, we would all be working on something dull like a space telescope, a Mars rover, or a space shuttle porta-potty. But it was far more exciting. We were crunching data from weather satellites.

Getting back to my story about a forlorn math guy complaining about the weather, when I made my comment about two weekends lost to lousy weather, I just happened to be surrounded by meteorologists. I was told that everybody knew that weather was cyclical, and it tends to follow a weekly pattern. If you know the weather on one particular Saturday, then you have a darn good guess at the weather the following Saturday.

I have used this factoid frequently. It always sounds impressive when I tell people what next weekend's weather is going to be like. I should say, it sounds impressive when they believe me. Whether my prediction comes to is actually irrelevant. No one remembers my predictions.

Is the "weekly weather pattern" factoid for real? I put it to the test.

The data

It isn't hard to find information on the current weather conditions. But yesterday's weather? That's a bit harder to find. And historical data? I found a place to dig up old weather. [2]

I downloaded four year's worth of weather data from Milwaukee; from July 22, 2009 to July 22, 2013. Why four years? Well... there is a story there. Once upon a time, there was a computer magazine called Byte. It was the magazine for computer geeks. They published an article with a BASIC program for doing a fast Fourier transform. I dug into the code and found it wasn't remotely the FFT (fast Fourier transform) algorithm that Cooley and Tukey made famous. Later, they had an article that looked at sales data that spanned many years of time. They said that a "four year analysis" was used to look for trends. I dropped my subscription. [3]

Mean temperature, Fourier anaylsis

Let's look first at the temperature (see plot below). It is apparent from the sine wave kinda thingie in the plot that I have collected four years' of data, and that the start of the data is kinda mid-summer.

Average daily temperature in Milwaukee over the past four years

That graph is and of itself is pretty cool, but playing with the data? I don't know about other math guys, but the first thing I like to do when I get a new fresh pile of data is to start taking Fourier transforms. It took a bit of massaging of the raw FFT output, but here is a graph showing a section of the frequency data. This graph shows the strength of the periodicity going from a four day period (on the left) to a 10 day period (on the right). There is no discernible peak or bump near the once-per-week mark.

FFT of the mean daily temperature

Correlation

Here is another look at that same data. In this case, instead of doing the whole Fourier bit, I did some correlations. I check to see if the daily temperature on a given day correlated with the daily temperature on the previous day. So... an array of four years of data, 1462 data points. I computed the correlation coefficient between two sub arrays, one going from day 1 to day 1461, and the other going from day 2 to day 1462. Then I computed the same thing, only with a lag of two days, and three days, and so on.

The graph below shows the results. At the very, very left, the correlation between each day's temperature and that same day's temperature is 1.0. Well, duh. The temperature on a given day looks a heckuva lot like the temperature on that same day. The plot below shows that today's temperature also looks a lot like yesterday's, with a correlation coefficient of close to 0.95. In the middle of the chart we see the correlation of today with one week ago. According to the hoity-toity meteorologists, I should see a big spike there, but no. Using the temperature a week ago to predict today's temperature is good (r = 0.85), but it is no better or worse than using the temperature six days or eight days ago.

Correlation coefficient of temperature as a function of spacing (in days)

So, I would say this myth is pretty well busted. That's the way I like my myths: pretty and well-busted. Temperature does not follow a hebdomadal pattern. That's a pity. It would have explained why a week isn't six days, or (shudder the thought!) eight days.

But wait! What about rain?!?!?

I missed something here, didn't I? This blog post started out talking about rainy, not temperature. Maybe I should have a look see at the rain data? Luckily, this database contains precipitation data, as well as wind speed and direction, barometric pressure, humidity, ... but all I want is the rain data.

I present below another correlation chart, this one showing the correlation of precipitation. There it is, as plain as the nose on Owen Wilson's face, a spike at lucky seven. There is a correlation (r = 0.1). This is roughly in the range of 99.95% significance range. So. Maybe my meteorologist friends weren't all that dumb after all?

Correlation coefficient of precipitation as a function of spacing (in days)

I should make a comment here about how big the number 0.1 is when it comes to correlation coefficients. Let's just say that Norm decided to make some money on these results. Every night, for four years, he would sit at a bar in Milwaukee, and take bets on whether it would rain the following week. I know, ideal job, right? If Norm follows the advice in this blog, betting that the rain today can predict that of one week from today, I guarantee that he would come out ahead at the end of the four years.

Norm, pondering this money-making scheme

On the other hand, the number 0.1 is very tiny. The predictive strength when r = 0.1 is on the order of 0.005. That's an indication of how much the odds are in Norm's favor. He is pretty sure of making money, but he needs to make a lot of bets to get there.

This may sound like a bad business model, but this is how casinos work. They want to run games where the table is tilted ever so slightly in their favor. Too much of a tilt (too low a chance of the patron winning), and people will walk away. Too little of a tilt, and the casino won't make a buck.

Another comment on the plot above. There is also a peak out at 29 and 31 days. Hmmmm... Maybe this is the effect of the moon?

---------------------------------

[1] If truth be told, although I have sung before, and have been in the rain, but this is actually a picture of Gene Kelly, not me. If more truth be told, Jean Kelly is my sister. Here is my favorite painting of hers.

[2] If truth be told, I didn't find this website on my own. Nate Silver told me about it. If even more truth be told, he didn't actually tell me in person. I read it in one of his books: The Signal and the Noise: Why So Many Predictions Fail — but Some Don't. He and I are good buddies. At least we would be if we ever met. That's my prediction.

[3] I had a house that I was getting ready to sell. The realtor told me that the entry way needed work - first impressions and all that. Updating this would give the whole house a new image. So, I gave it a fresh coat of paint, updated the light fixture, and polished up the handle of the big front door. I guess you could say that I performed image enhancement through a fast foyer transform. Some of you will find this joke incredibly funny.

John the Math Guy

Wednesday, August 7, 2013

Lousy weather every weekend

1 comment: