A winsorized mean is a winsorized statistical measure of central tendency, much like the mean and median, and even more similar to the truncated mean. It involves the calculation of the mean after winsorizing -- replacing given parts of a probability distribution or sample at the high and low end with the most extreme remaining values, typically doing so for an equal amount of both extremes; often 10 to 25 percent of the ends are replaced. The winsorized mean can equivalently be expressed as a weighted average of the truncated mean and the quantiles at which it is limited, which corresponds to replacing parts with the corresponding quantiles.
The winsorized mean is a useful estimator because by retaining the outliers without taking them too literally, it is less sensitive to observations at the extremes than the straightforward mean, and will still generate a reasonable estimate of central tendency or mean for almost all statistical models. In this regard it is referred to as a robust estimator.
The winsorized mean uses more information from the distribution or sample than the median. However, unless the underlying distribution is symmetric, the winsorized mean of a sample is unlikely to produce an unbiased estimator for either the mean or the median.
For a sample of 10 numbers (from x(1), the smallest, to x(10) the largest; order statistic notation) the 10% winsorized mean is
The key is in the repetition of x(2) and x(9): the extras substitute for the original values x(1) and x(10) which have been discarded and replaced.
This is equivalent to a weighted average of 0.1 times the 5th percentile (x(2)), 0.8 times the 10% trimmed mean, and 0.1 times the 95th percentile (x(9)).
Cette page est générée automatiquement et peut contenir des informations qui ne sont pas correctes, complètes, à jour ou pertinentes par rapport à votre recherche. Il en va de même pour toutes les autres pages de ce site. Veillez à vérifier les informations auprès des sources officielles de l'EPFL.
The sample mean (sample average) or empirical mean (empirical average), and the sample covariance or empirical covariance are statistics computed from a sample of data on one or more random variables. The sample mean is the average value (or mean value) of a sample of numbers taken from a larger population of numbers, where "population" indicates not number of people but the entirety of relevant data, whether collected or not. A sample of 40 companies' sales from the Fortune 500 might be used for convenience instead of looking at the population, all 500 companies' sales.
In ordinary language, an average is a single number taken as representative of a list of numbers, usually the sum of the numbers divided by how many numbers are in the list (the arithmetic mean). For example, the average of the numbers 2, 3, 4, 7, and 9 (summing to 25) is 5. Depending on the context, an average might be another statistic such as the median, or mode. For example, the average personal income is often given as the median—the number below which are 50% of personal incomes and above which are 50% of personal incomes—because the mean would be higher by including personal incomes from a few billionaires.
A winsorized mean is a winsorized statistical measure of central tendency, much like the mean and median, and even more similar to the truncated mean. It involves the calculation of the mean after winsorizing -- replacing given parts of a probability distribution or sample at the high and low end with the most extreme remaining values, typically doing so for an equal amount of both extremes; often 10 to 25 percent of the ends are replaced.
Couvre l'estimation des quantiles extrêmes à l'aide de quantiles empiriques et de données d'échantillonnage.
Couvre les méthodes d'identification et de traitement des valeurs extrêmes dans les données, y compris les procédures statistiques de détection aberrante.
Explore les régressions OLS pour les prix des maisons, couvrant les valeurs aberrantes, les observations influentes, les spécifications du modèle et les stratégies de sélection.