I haven't been following this thread that carefully, so the following point may already have been made. The computation of the variance below involves the addition/subtraction of very large values, which can result in massive cancellations. Thus, though while the formulae work, you may have to carry very large precisions. In fact, even the simple summation of the values themselves is a serious problem, as I discovered in a real application many years ago, and which I passed on to my students in the following form: for i=0 to 1 by 0.000000001 do <loop> I believe that if i is single precision IEEE float, the increment is washed out well before hitting 1.0, which means that this loop is actually an infinite loop. (In the old days, computers were so slow that loops of this sort appeared to be infinite in any case! Large scientific calculations need the precision to go up with the speed of the computer, so that naive computations like this still "work".) A very good summation is to remember all the values, sort them by absolute value, and then add them in terms of increasing value. An almost-as-good method is to keep log2(n) levels of sums, and sum as a binary tree. I seem to recall that the CACM (Comms. of the ACM) had a number of articles about the "real-time" standard deviation problem back in the late 1960's - early 1970's time frame. The index of this publication is online and freely accessible at acm.org. At 10:17 PM 7/16/03 -0600, Roland Silver wrote:
Bernie Cosell, All you need to do is compute on the fly the sum of squares S = x[1]^2 + ... + x[n]^2 of the empirical values, as well as the sum of values T = x[1] + ... + x[n]. The mean M = T/n, and the variance V = ((x[1] - M)^2 + (x[n] - M)^2)/(n-1). The SD is the square root of V. If you expand the terms in the equation for V, you get V = (S -2*M*T + n*M^2)/(n-1) = (S - n*M^2)/(n-1), which you can compute in one pass.