Summary
In statistics, regression toward the mean (also called reversion to the mean, and reversion to mediocrity) is the phenomenon where if one sample of a random variable is extreme, the next sampling of the same random variable is likely to be closer to its mean. Furthermore, when many random variables are sampled and the most extreme results are intentionally picked out, it refers to the fact that (in many cases) a second sampling of these picked-out variables will result in "less extreme" results, closer to the initial mean of all of the variables. Mathematically, the strength of this "regression" effect is dependent on whether or not all of the random variables are drawn from the same distribution, or if there are genuine differences in the underlying distributions for each random variable. In the first case, the "regression" effect is statistically likely to occur, but in the second case, it may occur less strongly or not at all. Regression toward the mean is thus a useful concept to consider when designing any scientific experiment, data analysis, or test, which intentionally selects the "most extreme" events - it indicates that follow-up checks may be useful in order to avoid jumping to false conclusions about these events; they may be "genuine" extreme events, a completely meaningless selection due to statistical noise, or a mix of the two cases. Consider a class of students taking a 100-item true/false test on a subject. Suppose that all students choose randomly on all questions. Then, each student's score would be a realization of one of a set of independent and identically distributed random variables, with an expected mean of 50. Naturally, some students will score substantially above 50 and some substantially below 50 just by chance. If one selects only the top scoring 10% of the students and gives them a second test on which they again choose randomly on all items, the mean score would again be expected to be close to 50. Thus the mean of these students would "regress" all the way back to the mean of all students who took the original test.
About this result
This page is automatically generated and may contain information that is not correct, complete, up-to-date, or relevant to your search query. The same applies to every other page on this website. Please make sure to verify the information with EPFL's official sources.
Related courses (32)
MATH-408: Regression methods
General graduate course on regression methods
DH-406: Machine learning for DH
This course aims to introduce the basic principles of machine learning in the context of the digital humanities. We will cover both supervised and unsupervised learning techniques, and study and imple
PHYS-467: Machine learning for physicists
Machine learning and data analysis are becoming increasingly central in sciences including physics. In this course, fundamental principles and methods of machine learning will be introduced and practi
Show more
Related lectures (124)
Neural Networks: Multilayer Learning
Covers the fundamentals of multilayer neural networks and deep learning, including back-propagation and network architectures like LeNet, AlexNet, and VGG-16.
Multilayer Neural Networks: Deep Learning
Covers the fundamentals of multilayer neural networks and deep learning.
Probabilistic Models for Linear Regression
Covers the probabilistic model for linear regression and its applications in nuclear magnetic resonance and X-ray imaging.
Show more
Related publications (76)

Bayes-optimal Learning of Deep Random Networks of Extensive-width

Florent Gérard Krzakala, Lenka Zdeborová, Hugo Chao Cui

We consider the problem of learning a target function corresponding to a deep, extensive-width, non-linear neural network with random Gaussian weights. We consider the asymptotic limit where the number of samples, the input dimension and the network width ...
2023
Show more
Related concepts (5)
Linear regression
In statistics, linear regression is a linear approach for modelling the relationship between a scalar response and one or more explanatory variables (also known as dependent and independent variables). The case of one explanatory variable is called simple linear regression; for more than one, the process is called multiple linear regression. This term is distinct from multivariate linear regression, where multiple correlated dependent variables are predicted, rather than a single scalar variable.
Ordinary least squares
In statistics, ordinary least squares (OLS) is a type of linear least squares method for choosing the unknown parameters in a linear regression model (with fixed level-one effects of a linear function of a set of explanatory variables) by the principle of least squares: minimizing the sum of the squares of the differences between the observed dependent variable (values of the variable being observed) in the input dataset and the output of the (linear) function of the independent variable.
Gambler's fallacy
The gambler's fallacy, also known as the Monte Carlo fallacy or the fallacy of the maturity of chances, is the incorrect belief that, if a particular event occurs more frequently than normal during the past, it is less likely to happen in the future (or vice versa), when it has otherwise been established that the probability of such events does not depend on what has happened in the past. Such events, having the quality of historical independence, are referred to as statistically independent.
Show more