**Are you an EPFL student looking for a semester project?**

Work with us on data science and visualisation projects, and deploy your project as an app on top of GraphSearch.

Concept# Median absolute deviation

Summary

In statistics, the median absolute deviation (MAD) is a robust measure of the variability of a univariate sample of quantitative data. It can also refer to the population parameter that is estimated by the MAD calculated from a sample.
For a univariate data set X1, X2, ..., Xn, the MAD is defined as the median of the absolute deviations from the data's median :
that is, starting with the residuals (deviations) from the data's median, the MAD is the median of their absolute values.
Consider the data (1, 1, 2, 2, 4, 6, 9). It has a median value of 2. The absolute deviations about 2 are (1, 1, 0, 0, 2, 4, 7) which in turn have a median value of 1 (because the sorted absolute deviations are (0, 0, 1, 1, 2, 4, 7)). So the median absolute deviation for this data is 1.
The median absolute deviation is a measure of statistical dispersion. Moreover, the MAD is a robust statistic, being more resilient to outliers in a data set than the standard deviation. In the standard deviation, the distances from the mean are squared, so large deviations are weighted more heavily, and thus outliers can heavily influence it. In the MAD, the deviations of a small number of outliers are irrelevant.
Because the MAD is a more robust estimator of scale than the sample variance or standard deviation, it works better with distributions without a mean or variance, such as the Cauchy distribution.
The MAD may be used similarly to how one would use the deviation for the average.
In order to use the MAD as a consistent estimator for the estimation of the standard deviation , one takes
where is a constant scale factor, which depends on the distribution.
For normally distributed data is taken to be
i.e., the reciprocal of the quantile function (also known as the inverse of the cumulative distribution function) for the standard normal distribution .
The argument 3/4 is such that covers 50% (between 1/4 and 3/4) of the standard normal cumulative distribution function, i.e.
Therefore, we must have that
Noticing that
we have that , from which we obtain the scale factor .

Official source

This page is automatically generated and may contain information that is not correct, complete, up-to-date, or relevant to your search query. The same applies to every other page on this website. Please make sure to verify the information with EPFL's official sources.

Related publications (1)

Related concepts (17)

Related courses (33)

Related MOOCs (3)

Statistical dispersion

In statistics, dispersion (also called variability, scatter, or spread) is the extent to which a distribution is stretched or squeezed. Common examples of measures of statistical dispersion are the variance, standard deviation, and interquartile range. For instance, when the variance of data in a set is large, the data is widely scattered. On the other hand, when the variance is small, the data in the set is clustered. Dispersion is contrasted with location or central tendency, and together they are the most used properties of distributions.

Deviation (statistics)

In mathematics and statistics, deviation is a measure of difference between the observed value of a variable and some other value, often that variable's mean. The sign of the deviation reports the direction of that difference (the deviation is positive when the observed value exceeds the reference value). The magnitude of the value indicates the size of the difference. Errors and residuals A deviation that is a difference between an observed value and the true value of a quantity of interest (where true value denotes the Expected Value, such as the population mean) is an error.

Median absolute deviation

In statistics, the median absolute deviation (MAD) is a robust measure of the variability of a univariate sample of quantitative data. It can also refer to the population parameter that is estimated by the MAD calculated from a sample. For a univariate data set X1, X2, ..., Xn, the MAD is defined as the median of the absolute deviations from the data's median : that is, starting with the residuals (deviations) from the data's median, the MAD is the median of their absolute values. Consider the data (1, 1, 2, 2, 4, 6, 9).

MICRO-110: Design of experiments

This course provides an introduction to experimental statistics, including use of population statistics to characterize experimental results, use of comparison statistics and hypothesis testing to eva

MGT-482: Principles of finance

The course provides a market-oriented framework for analyzing the major financial decisions made by firms. It provides an introduction to valuation techniques, investment decisions, asset valuation, f

FIN-474: Advanced risk management topics

The students learn different financial risk measures and their risk theoretical properties. They learn how to design and implement risk engines, with model estimation, forecast, reporting and validati

Cement Chemistry and Sustainable Cementitious Materials

Learn the basics of cement chemistry and laboratory best practices for assessment of its key properties.

Advanced statistical physics

We explore statistical physics in both classical and open quantum systems. Additionally, we will cover probabilistic data analysis that is extremely useful in many applications.

Advanced statistical physics

We explore statistical physics in both classical and open quantum systems. Additionally, we will cover probabilistic data analysis that is extremely useful in many applications.

Related lectures (342)

Generalized Linear Models have become a commonly used tool of data analysis. Such models are used to fit regressions for univariate responses with normal, gamma, binomial or Poisson distribution. Maximum likelihood is generally applied as fitting method. In the usual regression setting the least absolute-deviations estimator (L1-norm) is a popular alternative to least squares (L2-norm) because of its simplicity and its robustness properties. In the first part of this thesis we examine the question of how much of these robustness features carry over to the setting of generalized linear models. We study a robust procedure based on the minimum absolute deviation estimator of Morgenthaler (1992), the Lq quasi-likelihood when q = 1. In particular, we investigate the influence function of these estimates and we compare their sensitivity to that of the maximum likelihood estimate. Furthermore we particularly explore the Lq quasi-likelihood estimates in binary regression. These estimates are difficult to compute. We derive a simpler estimator, which has a similar form as the Lq quasi-likelihood estimate. The resulting estimating equation consists in a simple modification of the familiar maximum likelihood equation with the weights wq(μ). This presents an improvement compared to other robust estimates discussed in the literature that typically have weights, which depend on the couple (xi, yi) rather than on μi = h(xiT β) alone. Finally, we generalize this estimator to Poisson regression. The resulting estimating equation is a weighted maximum likelihood with weights that depend on μ only.

Storage Capacity: Prototypes and Neuronal DynamicsMOOC: Neuronal Dynamics 2- Computational Neuroscience: Neuronal Dynamics of Cognition

Explores the storage capacity of associative memory in networks of neurons and the impact of multiple prototypes on error rates.

Random Variables: Deterministic vs. RandomMSE-213: Probability and statistics for materials science

Explores random variables, their variability, realization of random processes, and the scientific method in material science.

Probability Theory: Random Variables and DistributionsMSE-238: Structure of materials

Introduces probability theory, random variables, and distributions, with a focus on their applications in atomic diffusion.