In statistics and probability theory, the median is the value separating the higher half from the lower half of a data sample, a population, or a probability distribution. For a data set, it may be thought of as "the middle" value. The basic feature of the median in describing data compared to the mean (often simply described as the "average") is that it is not skewed by a small proportion of extremely large or small values, and therefore provides a better representation of the center. Median income, for example, may be a better way to describe center of the income distribution because increases in the largest incomes alone have no effect on median. For this reason, the median is of central importance in robust statistics.
The median of a finite list of numbers is the "middle" number, when those numbers are listed in order from smallest to greatest.
If the data set has an odd number of observations, the middle one is selected. For example, the following list of seven numbers,
1, 3, 3, 6, 7, 8, 9
has the median of 6, which is the fourth value.
If the data set has an even number of observations, there is no distinct middle value and the median is usually defined to be the arithmetic mean of the two middle values. For example, this data set of 8 numbers
1, 2, 3, 4, 5, 6, 8, 9
has a median value of 4.5, that is . (In more technical terms, this interprets the median as the fully trimmed mid-range).
In general, with this convention, the median can be defined as follows: For a data set of elements, ordered from smallest to greatest,
if is odd,
if is even,
Formally, a median of a population is any value such that at least half of the population is less than or equal to the proposed median and at least half is greater than or equal to the proposed median. As seen above, medians may not be unique. If each set contains more than half the population, then some of the population is exactly equal to the unique median.
The median is well-defined for any ordered (one-dimensional) data, and is independent of any distance metric.