**Are you an EPFL student looking for a semester project?**

Work with us on data science and visualisation projects, and deploy your project as an app on top of GraphSearch.

Concept# Heavy-tailed distribution

Summary

In probability theory, heavy-tailed distributions are probability distributions whose tails are not exponentially bounded: that is, they have heavier tails than the exponential distribution. In many applications it is the right tail of the distribution that is of interest, but a distribution may have a heavy left tail, or both tails may be heavy.
There are three important subclasses of heavy-tailed distributions: the fat-tailed distributions, the long-tailed distributions, and the subexponential distributions. In practice, all commonly used heavy-tailed distributions belong to the subexponential class, introduced by Jozef Teugels.
There is still some discrepancy over the use of the term heavy-tailed. There are two other definitions in use. Some authors use the term to refer to those distributions which do not have all their power moments finite; and some others to those distributions that do not have a finite variance. The definition given in this article is the most general i

Official source

This page is automatically generated and may contain information that is not correct, complete, up-to-date, or relevant to your search query. The same applies to every other page on this website. Please make sure to verify the information with EPFL's official sources.

Related publications

Loading

Related people

Loading

Related units

Loading

Related concepts

Loading

Related courses

Loading

Related lectures

Loading

Related publications (16)

Related people

Related units

No results

No results

Loading

Loading

Loading

Related courses (5)

CS-401: Applied data analysis

This course teaches the basic techniques, methodologies, and practical skills required to draw meaningful insights from a variety of data, with the help of the most acclaimed software tools in the data science world (pandas, scikit-learn, Spark, etc.)

FIN-525: Financial big data

The course's first part introduces modern methods to acquire, clean, and analyze large quantities of financial data efficiently. The second part expands on how to apply these techniques to financial analysis, in particular to intraday data and investment strategies.

MGT-581: Introduction to econometrics

The course provides an introduction to econometrics. The objective is to learn how to make valid (i.e., causal) inference from economic data. It explains the main estimators and present methods to deal with endogeneity issues.

Related lectures (22)

In multiple testing problems where the components come from a mixture model of noise and true effect, we seek to first test for the existence of the non-zero components, and then identify the true alternatives under a fixed significance level $\alpha$. Two parameters, namely the fraction of the non-null components $\varepsilon$ and the size of the effects $\mu$, characterise the two-point mixture model under the global alternative. When the number of hypotheses $m$ goes to infinity, we are interested in an asymptotic framework where the fraction of the non-null components is vanishing, and the true effects need to be sizable to be detected. Donoho and Jin give an explicit form of the asymptotic detectable boundary based on the Gaussian mixture model under the classic calibration of the parameters of the mixture model. We prove the analogous results for the Cauchy mixture distribution as an example heavy-tailed case. This requires a different formulation of the parameters, which reflects the added difficulties.
We also propose a multiple testing procedure based on a filtering approach that can discover the true alternatives.
Benjamini and Hochberg (BH) compare the observed $p$-values to a linear threshold curve and reject the null hypotheses from the minimum up to the last up-crossing, and prove the false discovery rate (FDR) is controlled.
However, there is an intrinsic difference in heavy-tailed settings. Were we to use the BH procedure we would get a highly variable positive false discovery rate (pFDR). In our study we analyse the distribution of the $p$-values and devise a new multiple testing procedure to combine the usual case and the heavy-tailed case based on the empirical properties of the $p$-values. The filtering approach is designed to eliminate most $p$-values that are more likely to be uniform, while preserving most of the true alternatives. Based on the filtered $p$-values, we estimate the mode $\vartheta$ and define the rejection region $\mathscr{R}(\vartheta, \delta)=\left[ \vartheta -\delta/2, \vartheta +\delta/2 \right]$ such that the most informative $p$-values are included. The length $\delta$ is chosen by controlling the data-dependent estimation of FDR at a desired level.

Related concepts (26)

In statistics, a normal distribution or Gaussian distribution is a type of continuous probability distribution for a real-valued random variable. The general form of its probability density function

In probability theory and statistics, a probability distribution is the mathematical function that gives the probabilities of occurrence of different possible outcomes for an experiment. It is a mat

In probability theory and statistics, the exponential distribution or negative exponential distribution is the probability distribution of the time between events in a Poisson point process, i.e., a

We show that for a voter model on {0,1}Z corresponding to a random walk with kernel p(·) and starting from unanimity to the right and opposing unanimity to the left, a tight interface between 0's and 1's exists if p(·) has second moments but does not if p(·) fails to have α moment for some α < 2. We study the evolution of the interface for the one-dimensional voter model. We show that if the random walk kernel associated with the voter model has finite γth moment for some γ > 3, then the evolution of the interface boundaries converges weakly to a Brownian motion. This extends recent work of Newman, Ravishankar and Sun. Our result is optimal in the sense that a finite γth moment is necessary for this convergence for all γ ∈ (0, 3). We also obtain relatively sharp estimates for the tail distribution of the size of the equilibrium interface, extending earlier results of Cox and Durrett.

in finite sample studies redescending M-estimators outperform bounded M-estimators (see for example, Andrews et al. [1972. Robust Estimates of Location. Princeton University Press, Princeton]). Even though redescenders arise naturally out of the maximum likelihood approach if one uses very heavy-tailed models, the commonly used redescenders have been derived from purely heuristic considerations. Using a recent approach proposed by Shurygin, we study the optimality of redescending M-estimators. We show that redescending M-estimator can be designed by applying a global minimax criterion to locally robust estimators, namely maximizing over a class of densities the minimum variance sensitivity over a class of estimators. As a particular result, we prove that Smith's estimator, which is a compromise between Huber's skipped mean and Tukey's biweight, provides a guaranteed level of an estimator's variance sensitivity over the class of densities with a bounded variance. (C) 2007 Elsevier B.V. All rights reserved.