Are you an EPFL student looking for a semester project?
Work with us on data science and visualisation projects, and deploy your project as an app on top of Graph Search.
The Pareto distribution, named after the Italian civil engineer, economist, and sociologist Vilfredo Pareto, is a power-law probability distribution that is used in description of social, quality control, scientific, geophysical, actuarial, and many other types of observable phenomena; the principle originally applied to describing the distribution of wealth in a society, fitting the trend that a large portion of wealth is held by a small fraction of the population. The Pareto principle or "80-20 rule" stating that 80% of outcomes are due to 20% of causes was named in honour of Pareto, but the concepts are distinct, and only Pareto distributions with shape value (α) of log45 ≈ 1.16 precisely reflect it. Empirical observation has shown that this 80-20 distribution fits a wide range of cases, including natural phenomena and human activities. If X is a random variable with a Pareto (Type I) distribution, then the probability that X is greater than some number x, i.e. the survival function (also called tail function), is given by where xm is the (necessarily positive) minimum possible value of X, and α is a positive parameter. The Pareto Type I distribution is characterized by a scale parameter xm and a shape parameter α, which is known as the tail index. When this distribution is used to model the distribution of wealth, then the parameter α is called the Pareto index. From the definition, the cumulative distribution function of a Pareto random variable with parameters α and xm is It follows (by differentiation) that the probability density function is When plotted on linear axes, the distribution assumes the familiar J-shaped curve which approaches each of the orthogonal axes asymptotically. All segments of the curve are self-similar (subject to appropriate scaling factors). When plotted in a log-log plot, the distribution is represented by a straight line. The expected value of a random variable following a Pareto distribution is The variance of a random variable following a Pareto distribution is (If α ≤ 1, the variance does not exist.
Pierre Vandergheynst, Milos Vasic, Francesco Craighero, Renata Khasanova
Volkan Cevher, Grigorios Chrysos, Fanghui Liu