Robust measures of scale

In statistics, robust measures of scale are methods that quantify the statistical dispersion in a sample of numerical data while resisting outliers. The most common such robust statistics are the interquartile range (IQR) and the median absolute deviation (MAD). These are contrasted with conventional or non-robust measures of scale, such as sample standard deviation, which are greatly influenced by outliers. These robust statistics are particularly used as estimators of a scale parameter, and have the advantages of both robustness and superior efficiency on contaminated data, at the cost of inferior efficiency on clean data from distributions such as the normal distribution. To illustrate robustness, the standard deviation can be made arbitrarily large by increasing exactly one observation (it has a breakdown point of 0, as it can be contaminated by a single point), a defect that is not shared by robust statistics. One of the most common robust measures of scale is the interquartile range (IQR), the difference between the 75th percentile and the 25th percentile of a sample; this is the 25% trimmed range, an example of an L-estimator. Other trimmed ranges, such as the interdecile range (10% trimmed range) can also be used. For a Gaussian distribution, IQR is related to as . Another familiar robust measure of scale is the median absolute deviation (MAD), the median of the absolute values of the differences between the data values and the overall median of the data set; for a Gaussian distribution, MAD is related to as (the derivation can be found here). Robust measures of scale can be used as estimators of properties of the population, either for parameter estimation or as estimators of their own expected value. For example, robust estimators of scale are used to estimate the population standard deviation, generally by multiplying by a scale factor to make it an unbiased consistent estimator; see scale parameter: estimation. For example, dividing the IQR by 2 erf−1(1/2) (approximately 1.

Graph Chatbot

Reliable data-driven decision-making through optimal transport

Understanding generalization and robustness in modern deep learning

Robust discrete choice models with t-distributed kernel errors

Reliable data-driven decision-making through optimal transport

Understanding generalization and robustness in modern deep learning

Robust discrete choice models with t-distributed kernel errors