Yogi
Berra once said that “predictions are hard, especially when they are about
the future”. Or maybe it was Niels Bohr who said it? Or Casey Stengel, or
Mark Twain, or Sam Goldwyn, or Dan Quayle? Nobody is sure who
first said it, because the past is also hard to predict.


I
offer here one explanation of what makes prediction hard. It has to do with
finding the right underlying model.


Which is the right model?


The
population growth problem from 6th grade


The
year was 1970. I was in sixth grade. The teacher gave us an exercise in
looking for patterns which would (as a side effect) increase our awareness of
the global population problem. We were given the data in the first two
columns of the table at the right. Note that the years listed are for
successive doublings of the world population. Our assignment was to compute
the doubling time, shown in column three, and then predict the world
population in the year 2020.


Sixth grade assignment


We
were supposed to notice that each successive doubling time is half the
previous doubling time. The world population doubled between 1870 and 1970
(in 100 years), so the next doubling to 6.4 billion would require 50 years.
According to the rule we determined in class, the world populations would
reach 6.4 billion in 2020.


Well,
the estimate was a bit off. We reached 6.4 billion in 2005. Clearly the
prediction was not drastic enough. But wait…


Taking
this a step further, we would expect that the world population would hit 12.8
billion in 2045, and 25.6 billion in July of 2057. In sixth grade, there was
something unsettling about this. Of course, the idea of unrestrained
population growth was alarming, but somehow I couldn’t help but think that
there was something wrong there.


The
ludicrousness of this prediction only occurred to me years later. According
to the model, the doubling period would eventually reach the very short time
span of nine months. Now, in order for the population to double in this
amount of time, every woman on the planet, aged 1 to 101, would need to be
pregnant, and must give birth to twins[2]!
The next doubling would occur in only four and a half months, so all the
twins would need to be pregnant when they are born... What a curious world
our descendants will live in!


In the
year 2081


In his book 2081, A hopeful View of the Human Future, Gerard K. O'Neill made
an interesting historical observation. He looked at the typical speed when
someone travelled. Here is his chart.




He
noted that every century, there has been a tenfold increase in speed, so it
would follow logically that in another century we will hop into our mass
transit vehicles (whatever they may be) and speed off at eight times the
speed of sound. Maybe that’s not unreasonable. If we are taking pleasure
trips to the moon, then that might actually be rather slow.


Now
if we take his formula further forward, we can see that in the year 2581,
people will be regularly making trips at about ten times the speed of light.
This stretches my credibility a little bit. I prefer to obey speed limits,
especially when it comes to the speed of light. Clearly science fiction
writers disagree with me on that.


If
we go the other direction, we can see that O’Neill’s formula falls into
ridiculousville almost immediately. His formula would predict that a typical
speed of travel would be 0.6 MPH, about onefifth of a person’s normal
walking speed.


What
we have here is another example of an inappropriate mathematical model being
used to make predictions. We might as well play off the fact that the word “train”
has half as many letters as “stagecoach”, and that “jet” as half as many
letters as “train”. (Well… kinda.) The next big leap forward in travel will
have 1.25 letters.


The exponential growth of energy usage


I had a sense of déjà vu when I took
Environmental Geology in college. There was a homework question that was
aimed at impressing on us the disastrous implications of unbridled
exponential growth.


The problem
stated that the annual worldwide energy usage had been increasing throughout
the century at an annual rate of 5%. The problem went on to state that there
is a theoretical upper bound on the total energy available if all the matter
in the entire Earth is converted into energy. This upper bound is stated in
Einstein’s formula e = mc^{2}.
Figures were given for the mass of the Earth and current energy usage, and
the question was asked: In what year would our annual energy requirements
equal the total available energy?


The answer
we arrived at was only a few millennia away, perhaps the year 3500, or maybe
10,000, I honestly don’t remember. I do remember the next question, and the
answer that I gave. The question was “What do you conclude?” I knew that the
correct answer was “It is high time for us to get off our lazy butts and do
something about the energy crisis, because in fewer than 100 generations
there will be a million billion trillion people standing on the last crumb
that is left of the Earth.


I knew that,
but I was stubborn. My answer was “Nothing”. For some reason, I did not
receive full credit for that answer. The university environment does not
favor the creative mind. Or the lazy one, either.


My
comeback


I
had lost one point on a homework assignment that was worth 1% of my grade in
the class. I should have just stopped right there, but I had a point to make.
It was all about the principle of the thing. I spent hours and hours
defending my short answer.


First,
I amended my answer a little, from “nothing” to “nothing, because the answer
depends a great deal on the underlying mathematical model that is assumed for
the growth of energy usage”. Then, I got out my slide rule and did some curve
fitting. I don’t have the original work. I am sure that the paper I wrote it
out on has long since crumbled to dust. I will reproduce the salient aspects
of it.


In
this first graph, I show some data that I cooked up. The data is an
exponential curve with the some added to it. In the graph, I also show the
least squares fit of an exponential curve. This shows that the data can be
approximated with an exponential curve that is increasing at 5.33% per year
for 50 years.


Hypothetical energy
usage, with exponential curve


I
will take this to be a reasonable approximation of what energy usage data
might look like. And, based on this data, a scientist or policymaker might
come to the conclusion that energy usage is going up at a rate of about 5%
per year. (Actually, the data I have dug up looks more chaotic than this.
Wars and other catastrophes do a good job of being unpredictable.)


In
this next graph, I show that same data, but this time it is approximated by a
parabola. Looking at this curve fit, I don’t think I could testify in court
that the data must be an exponential. A parabola doesn’t do too bad at
fitting the data.


Hypothetical energy
usage, with parabolic fit curve


Although,
maybe the fit is not so good at the far left side? The parabola dips downward
just a tiny bit in the first few years, and the data seems to be going
upward. I can fix this fairly easily by using a cubic parabola, a third order
polynomial. Just looking at the graphs, I can see no reason why someone would
reject this fit over the fit of the exponential.


Hypothetical energy
usage, with third order polynomial fit


Why stop at third order? Just for
grins, I had a look at fitting the data with a fifth order polynomial. Once
again, the fit looks pretty good.


Hypothetical energy
usage, with fifth order polynomial fit


Why
not try seventh order? Well, I did try it, and I think I will reject this
one, maybe just for aesthetic reasons. The curve is a little bumpy. I am not
sure the data has enough evidence to support those bumps.


Hypothetical energy
usage, with seventh order polynomial fit


Taking a brief excursion back to ridiculousville, I
tried using a 20th order polynomial to fit the data. Clearly the wiggles in
this polynomial are not a true feature of the real data, but a feature of the
noise. (Those who read my post on When
regression goes bad will understand why this failed.


Hypothetical energy
usage, with twentieth order polynomial fit


I
did one last curve fit. This one starts with some reasonable assumptions
about growth. An exponential curve is a reasonable approximation for growth
of most physical things at the onset, but eventually in any real system,
there has to be some saturation. Bunnies multiply, but eventually they run
out of food.


For
anything real, there has to be constrained growth. One commonly used model
for this is the logistics
curve. This curve shows initial exponential growth, but the growth
gradually slows down as it approaches an asymptote. Once again, the fit looks
fairly reasonable.


Hypothetical energy
usage, with logistic curve fit


The punch
line


So
far, all I have done is demonstrate that a number of curves can be bent
around to look like a noisy exponential growth curve. At arm’s length, they
all do a modest job at approximating the data that I provided. While some
curves are somewhat better than others, there is no slam dunk best curve.
Hang on to that thought, because the punch line is coming.


Each
of the curves that I have fit to the data can be used to predict what the
energy usage will be at some later date. I have gone through that exercise
with each of these curves to yield a prediction about the energy usage in
year 100 (50 years beyond the end of the data), and at year 500.




I hope you are saying wow. These
equations all looked kind of similar from t = 0 to t = 50. Even at t = 100, we have megaginormous disagreements
on what the energy usage will be: anywhere from the silly value of 6 X 10^{13}
up to the tremendous value of 4 X 10^{10}. I think I can safely say
that I have made my point. The underlying choice of model might not matter
much for interpolation, but for extrapolation, the choice of model can change
an estimate by many orders of magnitude.


Actually, the only model that did not
give huge answers for the 500 year estimate is the logistic model, the
mathematical equation that was designed to model constrained growth. Hmmm…


Conclusion


Considerable
effort usually goes into finding an equation that fits the existing data.
Often, a variety of equations are tried and the one that best fits the
existing data is chosen.


This
typical process omits a crucial step. That step getting to know the data,
understanding the natural constraints, and looking at the forces that drive
the values up or down. This knowledge should drive the choice of mathematical
model.

[1]
Little bit of trivia here... the year before 1 AD was 1 BC, rather than 0. The
correct difference between these two years is thus 1600, rather than 1601.
[2]
I assume that there are an equal number of men and women. If there is only a
single, very busy man, the women can be spared having to have twins.
Refresh my memory, were you with MR. Kashel or Mrs Brewer in 6th grade?
ReplyDeleteGreat article, btw.
Thank you, Gary. I honestly don't remember for sure, but I think it was Mr. Kaschel. Do you remember that exercise?
ReplyDeleteRemember the exercise? I'm amazed I remembered our teachers' names. Though, I can't quite put my finger on Mrs. Brewer's maiden name. Remember when she caught us attaching cans on strings to her car the last school day before her wedding?
ReplyDeleteI don't remember that particular exercise, but I don't doubt if for a minute. I remember being taught how we would run out of oil before the end of the century, and how we were triggering an ice age through manmade global cooling (due to aerosols in the air, as I recall). I wonder what bad math/science our children will look back and chuckle at in 40 years.