Êtes-vous un étudiant de l'EPFL à la recherche d'un projet de semestre?
Travaillez avec nous sur des projets en science des données et en visualisation, et déployez votre projet sous forme d'application sur Graph Search.
In statistics, the median absolute deviation (MAD) is a robust measure of the variability of a univariate sample of quantitative data. It can also refer to the population parameter that is estimated by the MAD calculated from a sample. For a univariate data set X1, X2, ..., Xn, the MAD is defined as the median of the absolute deviations from the data's median : that is, starting with the residuals (deviations) from the data's median, the MAD is the median of their absolute values. Consider the data (1, 1, 2, 2, 4, 6, 9). It has a median value of 2. The absolute deviations about 2 are (1, 1, 0, 0, 2, 4, 7) which in turn have a median value of 1 (because the sorted absolute deviations are (0, 0, 1, 1, 2, 4, 7)). So the median absolute deviation for this data is 1. The median absolute deviation is a measure of statistical dispersion. Moreover, the MAD is a robust statistic, being more resilient to outliers in a data set than the standard deviation. In the standard deviation, the distances from the mean are squared, so large deviations are weighted more heavily, and thus outliers can heavily influence it. In the MAD, the deviations of a small number of outliers are irrelevant. Because the MAD is a more robust estimator of scale than the sample variance or standard deviation, it works better with distributions without a mean or variance, such as the Cauchy distribution. The MAD may be used similarly to how one would use the deviation for the average. In order to use the MAD as a consistent estimator for the estimation of the standard deviation , one takes where is a constant scale factor, which depends on the distribution. For normally distributed data is taken to be i.e., the reciprocal of the quantile function (also known as the inverse of the cumulative distribution function) for the standard normal distribution . The argument 3/4 is such that covers 50% (between 1/4 and 3/4) of the standard normal cumulative distribution function, i.e. Therefore, we must have that Noticing that we have that , from which we obtain the scale factor .
Nicolas Lawrence Etienne Longeard
Jian Wang, Matthias Finger, Qian Wang, Yiming Li, Matthias Wolf, Varun Sharma, Yi Zhang, Konstantin Androsov, Jan Steggemann, Xin Chen, Rakesh Chawla, Matteo Galli, Anna Mascellani, João Miguel das Neves Duarte, Tagir Aushev, Tian Cheng, Yixing Chen, Werner Lustermann, Andromachi Tsirou, Alexis Kalogeropoulos, Andrea Rizzi, Ioannis Papadopoulos, Paolo Ronchese, Hua Zhang, Siyuan Wang, Tao Huang, David Vannerom, Michele Bianco, Sebastiana Gianì, Sun Hee Kim, Kun Shi, Abhisek Datta, Federica Legger, Gabriele Grosso, Ji Hyun Kim, Donghyun Kim, Zheng Wang, Sanjeev Kumar, Wei Li, Yong Yang, Geng Chen, Ajay Kumar, Ashish Sharma, Georgios Anagnostou, Joao Varela, Csaba Hajdu, Muhammad Ahmad, Ioannis Evangelou, Milos Dordevic, Meng Xiao, Sourav Sen, Xiao Wang, Kai Yi, Jing Li, Rajat Gupta, Hui Wang, Seungkyu Ha, Long Wang, Pratyush Das, Anton Petrov, Xin Sun, Xin Gao, Valérie Scheurer, Giovanni Mocellin, Muhammad Ansar Iqbal
, ,