Regression dilution, also known as regression attenuation, is the biasing of the linear regression slope towards zero (the underestimation of its absolute value), caused by errors in the independent variable.
Consider fitting a straight line for the relationship of an outcome variable y to a predictor variable x, and estimating the slope of the line. Statistical variability, measurement error or random noise in the y variable causes uncertainty in the estimated slope, but not bias: on average, the procedure calculates the right slope. However, variability, measurement error or random noise in the x variable causes bias in the estimated slope (as well as imprecision). The greater the variance in the x measurement, the closer the estimated slope must approach zero instead of the true value.
It may seem counter-intuitive that noise in the predictor variable x induces a bias, but noise in the outcome variable y does not. Recall that linear regression is not symmetric: the line of best fit for predicting y from x (the usual linear regression) is not the same as the line of best fit for predicting x from y.
Regression slope and other regression coefficients can be disattenuated as follows.
The case that x is fixed, but measured with noise, is known as the functional model or functional relationship.
It can be corrected using total least squares and errors-in-variables models in general.
The case that the x variable arises randomly is known as the structural model or structural relationship. For example, in a medical study patients are recruited as a sample from a population, and their characteristics such as blood pressure may be viewed as arising from a random sample.
Under certain assumptions (typically, normal distribution assumptions) there is a known ratio between the true slope, and the expected estimated slope. Frost and Thompson (2000) review several methods for estimating this ratio and hence correcting the estimated slope.
This page is automatically generated and may contain information that is not correct, complete, up-to-date, or relevant to your search query. The same applies to every other page on this website. Please make sure to verify the information with EPFL's official sources.
Machine learning methods are becoming increasingly central in many sciences and applications. In this course, fundamental principles and methods of machine learning will be introduced, analyzed and pr
This course provides in-depth understanding of the most fundamental algorithms in statistical pattern recognition or machine learning (including Deep Learning) as well as concrete tools (as Python sou
In statistics, linear regression is a linear approach for modelling the relationship between a scalar response and one or more explanatory variables (also known as dependent and independent variables). The case of one explanatory variable is called simple linear regression; for more than one, the process is called multiple linear regression. This term is distinct from multivariate linear regression, where multiple correlated dependent variables are predicted, rather than a single scalar variable.
In statistics, errors-in-variables models or measurement error models are regression models that account for measurement errors in the independent variables. In contrast, standard regression models assume that those regressors have been measured exactly, or observed without error; as such, those models account only for errors in the dependent variables, or responses. In the case when some regressors have been measured with errors, estimation based on the standard assumption leads to inconsistent estimates, meaning that the parameter estimates do not tend to the true values even in very large samples.
In statistics, simple linear regression is a linear regression model with a single explanatory variable. That is, it concerns two-dimensional sample points with one independent variable and one dependent variable (conventionally, the x and y coordinates in a Cartesian coordinate system) and finds a linear function (a non-vertical straight line) that, as accurately as possible, predicts the dependent variable values as a function of the independent variable. The adjective simple refers to the fact that the outcome variable is related to a single predictor.
Explores supervised learning in financial econometrics, covering linear regression, model fitting, potential problems, basis functions, subset selection, cross-validation, regularization, and random forests.
We report the successful acquisition of surface electromyography (sEMG) signals from an intelligent armband and its application in localized muscle fatigue monitoring with a costume-designed, small-scale front-end readout circuitry. The correlation coeffic ...
We probe the accuracy of linear ridge regression employing a three-body local density representation derived from the atomic cluster expansion. We benchmark the accuracy of this framework in the prediction of formation energies and atomic forces in molecul ...
Here we discuss "hidden variables", which are typically introduced during an experiment as a consequence of the application of two independent variables together to create a stimulus. With increased sophistication in modern chemical biology tools and relat ...