Re: [math-fun] Calculating the standard deviation incrementally

17 Jul 2003

      I haven't been following this thread that carefully, so the
following point may already have been made.

The computation of the variance below involves the addition/subtraction
of very large values, which can result in massive cancellations.  Thus,
though while the formulae work, you may have to carry very large precisions.

In fact, even the simple summation of the values themselves is a serious
problem, as I discovered in a real application many years ago, and which
I passed on to my students in the following form:

for i=0 to 1 by 0.000000001 do <loop>

I believe that if i is single precision IEEE float, the increment is washed
out well before hitting 1.0, which means that this loop is actually
an infinite loop.

(In the old days, computers were so slow that loops of this sort appeared
to be infinite in any case!  Large scientific calculations need the precision
to go up with the speed of the computer, so that naive computations like this
still "work".)

A very good summation is to remember all the values, sort them by absolute
value, and then add them in terms of increasing value.

An almost-as-good method is to keep log2(n) levels of sums, and sum as
a binary tree.

I seem to recall that the CACM (Comms. of the ACM) had a number of articles
about the "real-time" standard deviation problem back in the late 1960's -
early 1970's time frame.  The index of this publication is online and freely
accessible at acm.org.

At 10:17 PM 7/16/03 -0600, Roland Silver wrote:
...
Bernie Cosell,
All you need to do is compute on the fly the sum of squares S = x[1]^2 
+ ... + x[n]^2 of the empirical values, as well as the sum of values T 
= x[1] + ... + x[n]. The mean M = T/n, and the variance V = ((x[1] - 
M)^2 + (x[n] - M)^2)/(n-1). The SD is the square root of V. If you 
expand the terms in the equation for V, you get V = (S -2*M*T + 
n*M^2)/(n-1) =  (S - n*M^2)/(n-1), which you can compute in one pass.

Re: [math-fun] Calculating the standard deviation incrementally

Henry Baker