Concept

Algorithms for calculating variance

Summary
Algorithms for calculating variance play a major role in computational statistics. A key difficulty in the design of good algorithms for this problem is that formulas for the variance may involve sums of squares, which can lead to numerical instability as well as to arithmetic overflow when dealing with large values. A formula for calculating the variance of an entire population of size N is: Using Bessel's correction to calculate an unbiased estimate of the population variance from a finite sample of n observations, the formula is: Therefore, a naïve algorithm to calculate the estimated variance is given by the following: Let n ← 0, Sum ← 0, SumSq ← 0 For each datum x: n ← n + 1 Sum ← Sum + x SumSq ← SumSq + x × x Var = (SumSq − (Sum × Sum) / n) / (n − 1) This algorithm can easily be adapted to compute the variance of a finite population: simply divide by n instead of n − 1 on the last line. Because SumSq and (Sum×Sum)/n can be very similar numbers, cancellation can lead to the precision of the result to be much less than the inherent precision of the floating-point arithmetic used to perform the computation. Thus this algorithm should not be used in practice, and several alternate, numerically stable, algorithms have been proposed. This is particularly bad if the standard deviation is small relative to the mean. The variance is invariant with respect to changes in a location parameter, a property which can be used to avoid the catastrophic cancellation in this formula. with any constant, which leads to the new formula the closer is to the mean value the more accurate the result will be, but just choosing a value inside the samples range will guarantee the desired stability. If the values are small then there are no problems with the sum of its squares, on the contrary, if they are large it necessarily means that the variance is large as well. In any case the second term in the formula is always smaller than the first one therefore no cancellation may occur.
About this result
This page is automatically generated and may contain information that is not correct, complete, up-to-date, or relevant to your search query. The same applies to every other page on this website. Please make sure to verify the information with EPFL's official sources.