I once found a bug in Excel. This was a long time ago, and they have
since fixed it. I also found that same bug in every calculator I could find.
I found this ubiquitous bug because I was looking for it. You see, I had
found that bug in my own software. Somehow, finding that bug elsewhere made
me feel better.

I was computing the standard deviation of all the pixels in an image,
so there were umptyleven thousand data points in the computation^{1}.
I used the recommended method, which seemed like a good thing. Boy! Was I
ever wrong! All of a sudden, I found I was asking the computer to find the
square root of a negative number. And it wasn’t liking me.

No
imagination allowed

I found the cause of the error, and realized it was bum advice from
my college statistics book. Dumb old college, anyway!

Computation
of standard deviation

There are two popular formulae for the computation of standard
deviation. The first formula is the definition, given below. This formula is
moderately understandable and is invariably the first formula given.

Another formula often follows immediately on the footsteps of the
first:


Why
this formula should be a measure of deviation from the mean is not
immediately obvious, but the leading statistics texts generally devote some
ink to showing that the two formulae are equivalent^{2}. They will
then point out that the second formula is more convenient to use, and
requires less calculations. This rephrasing of the original equation seems to
be the formula of choice.

I took a quick poll of the introductory statistics books at hand. The
following eight books were sampled. (If a page number is given in
parentheses, this is the page where the book introduces the second form of
the standard deviation formula.)

Bevington
(p. 14)
Freedman,
Pisani and Purves (p. 65)
Langley (p.
58)
Mandel (not
given)
Mendenhall
(p. 39)
Meyer (not
given)
Miller and Freund
(p. 1567)
Snedecor and
Cochran (p. 323)

Two of the texts are clearly in love with this second formula.

[The second formula] “has the dual advantage of requiring less labor,
and giving better accuracy.”
Miller, et al.

[The second formula] “tends to give better computational accuracy
than the method utilizing the deviations.”
Mendenhall

There seems to be a strong consensus here. This would seem to be fact
then, right?

Difficulty with standard formula

The
second formula is often chosen because it requires only a single pass through
the data, without having to remember the whole array. As you go through the
data points, you sum the values (in order to compute the average), and you
sum the squares of the values, and you count the number of data points. These
three numbers plug into the second formula to give the standard deviation.

On a calculator, this is a very practical concern, especially back in
my day when we used to have to make our own calculators with paper clicks,
Elmer’s Glue, and used Popsicle sticks. It is (or was) considered a luxury to
be able to store more than a few numbers in the memory. The second formulas
managed to get by with storing only three numbers.

So, this would seem to be a recipe for paradise. Six out of eight
stats textbooks recommend brushing with the second formula. On top of that,
the second formula seems tailor made for a calculator with limited memory.

But, alas, there is trouble in paradise.

I reproduce here a quote from a book which I have found to be very practicalminded,
and which has agreed with me every time that I have felt qualified to have an
opinion.

[The second formula] “is generally unjustifiable in terms of
computing speed and / or roundoff error.”
Press, et al., p. 458

Even more to the point is the following quote

“Novice programmers who calculate the standard deviation of some
observations by using the [second] formula...often find themselves taking the
square root of a negative number! A much better way to calculate means and
standard deviations is to use the recurrence formulas...”
Knuth^{3}, p 216

I believe we have some disagreement here. These two texts are clearly
in the minority. But since neither one of these books is actually a
statistics books, maybe we should just ignore them? Except for the fact that
the minority is right. I became a believer because I was once a novice
programmer who found himself taking the square root of a negative number^{4}.

The little difficulty of the second formula comes from a recurring
issue that causes numerical analysts to wake up in the middle of the night in
a cold sweat: the loss of precision caused by subtracting two numbers that
are close in size. If the average of the data is quite large compared to the
standard deviation, or if n is
quite large, then the second formula will require the difference to be
computed between two large and nearly equal numbers. You need a whole lot of
precision to pull this off.

To illustrate, consider the following data set of three points:
(1,000,001, 1,000,002, and 1,000,003). The average of the data is obviously
1,000,002. We can easily compute the correct standard deviation from the
first equation:

errata  the third term should be (1000003  1000002)^2


Using the second equation for computing the data set, we first
compute the sum of the squares of the data: 3,000,012,000,014. (Gosh, that’s
a big number!) Then we compute n times (the average of the data squared):
3,000,012,000,012.

We next take the difference between these two numbers. Provided we
have at least 13 digits of precision, we get the correct difference of 2, and
subsequently the correct standard deviation of 1. But with less than 13
digits of precision, the difference could be negative or positive, but
probably isn’t 2. This is what Press et al., and Knuth were warning of.

Two pass
algorithm

The most obvious way to avoid this bug is to go back to the original
formula. This algorithm requires two passes through the data: the first pass
to compute the mean, and the second pass to compute the variance.

This method has the disadvantage of requiring the data to be retained
for a second pass. You just can’t do that in a calculator with limited
memory. Knuth (p. 216) suggested a different option.

The
recursive algorithm

Now we are ready for the fun
part. Let’s say that we have the average of 171 numbers, and we want to
average one more number in. Do we have to start over at the beginning and add
up all 172 numbers? Or is there some shortcut? Is there ever an end to
rhetorical questions?

The answers are no, yes, and
“why do you ask so many questions?”

The really fun part is that the
sum of the first 171 numbers is just 171 times the average. To get the
average of the 172 numbers, you multiply the average of the first 171 numbers
by 171. Then you add the 172^{nd} number, and divide by 172.
Diabolically clever, eh?

Put
in algebraic form, if the average of the first j numbers is m_{j,}

Then
the average of the first j+1
numbers is given by this formula

with m_{1}
= x_{1.}

This
is known as a recursive formula, one that builds on previous calculations
rather than one that starts from scratch. The recursive formula for the
average is pretty simple to understand intuitively. There is a bit more
algebra in between, but it is possible to calculate the standard deviation in
a recursive manner as well. If you know the standard deviation and mean of
the first 171 data points, you can calculate the standard deviation of the
set of 172 data points from this.

Here
is the recursive formula for the variance, which is the square of the
standard deviation:


with

This
is the formula for the variance. The standard deviation is the square root of
this.

Summary

This
has been a little lesson in numerical analysis called “don’t subtract one big
number from another”. And a little lesson in “don’t believe everything you
read”. And lastly, a lesson in “recursion can be your friend”.

References

Bevington, Phillip, Data Reduction and Error Analysis for the Physical
Sciences, McGrawHill, 1969
Freedman, Pisani and Purves, Statistics, 1978, W.W. Norton
& Co.
Hamming, Richard, Numerical Methods for Scientists and Engineers,
1962, McGrawHill
Knuth, Donald E., Art of Computer Programming, Vol. 2,
SemiNumerical Algorithms, 2nd edition, 1981, Addison Wesley
Langley, Russel, Practical Statistics Simply Explained, 1971,
Dover
Mandel, John, The Statistical Analysis of Experimental Data,
1964, Dover
Mendenhall, William, Introduction to Probability and Statistics,
4th ed. 1975, Duxbury Press
Meyer, Stuart, Data Analysis for Scientists and Engineers,
1975, Wiley and Sons
Miller, Irwin, Freund, John, Probability and Statistics for
Engineers, 3rd ed. 1985, Prentice Hall
Press, William, Flannery, Brian, Teukolsky, Saul, Vetterling,
William, Numerical Recipes, the Art of Scientific Computing, Cambridge
University Press, 1986
Snedecor,
George, Cochran, William, Statistical Methods, 7th ed. 1980, Iowa
State University Press

1)
To be honest, these were the days when a big image was 640 by 480, so there
weren’t quite that many data points.
2)
Since the derivation is so widely available, it will not be reproduced here.
Suffice it to say that the squared quantity in the first formula is expanded,
and the summation is broken into three summations, some of which can be
simplified by replacement with the definition of the average.
3)
Donald Knuth, was born, by the way, in Milwaukee, WI. I currently live in
Milwaukee. His father ran a printing business in Milwaukee. I work for a
little mom and pop printing company in Milwaukee. Need I go on?
4)
I was once a novice programmer, but I don't spend much time programming
anymore. I remain, however, a novice!

Wednesday, August 15, 2012
Deviant ways to compute the deviation
Subscribe to:
Post Comments (Atom)
There's a typo below "To illustrate, consider the following data set of three points: (1,000,001, 1,000,002, and 1,000,003). The average of the data is obviously 1,000,002. We can easily compute the correct standard deviation from the first equation:"
ReplyDeleteLook at the third term in the numerator under the square root.
Whoops!!! I'll fix that!
ReplyDeleteThe square root of 1 can't be detected by calculator as it is i which is called as iota.It equals to the square root of 1.And its square is 1.
ReplyDeleteFluid Dynamics
Greetings and Salutations John!
ReplyDeleteVery much enjoyed your article "Deviant ways to compute the deviation".
Would it be possible to have a .pdf copy of the article?
CarlosLuis Santana.