In statistics, errors-in-variables models or measurement error models are regression models that account for measurement errors in the independent variables. In contrast, standard regression models assume that those regressors have been measured exactly, or observed without error; as such, those models account only for errors in the dependent variables, or responses.
In the case when some regressors have been measured with errors, estimation based on the standard assumption leads to inconsistent estimates, meaning that the parameter estimates do not tend to the true values even in very large samples. For simple linear regression the effect is an underestimate of the coefficient, known as the attenuation bias. In non-linear models the direction of the bias is likely to be more complicated.
Consider a simple linear regression model of the form
where denotes the true but unobserved regressor. Instead we observe this value with an error:
where the measurement error is assumed to be independent of the true value .
If the ′s are simply regressed on the ′s (see simple linear regression), then the estimator for the slope coefficient is
which converges as the sample size increases without bound:
This is in contrast to the "true" effect of , estimated using the ,:
Variances are non-negative, so that in the limit the estimated is smaller than , an effect which statisticians call attenuation or regression dilution. Thus the ‘naïve’ least squares estimator is an inconsistent estimator for . However, is a consistent estimator of the parameter required for a best linear predictor of given the observed : in some applications this may be what is required, rather than an estimate of the ‘true’ regression coefficient , although that would assume that the variance of the errors in the estimation and prediction is identical. This follows directly from the result quoted immediately above, and the fact that the regression coefficient relating the ′s to the actually observed ′s, in a simple linear regression, is given by
It is this coefficient, rather than , that would be required for constructing a predictor of based on an observed which is subject to noise.
This page is automatically generated and may contain information that is not correct, complete, up-to-date, or relevant to your search query. The same applies to every other page on this website. Please make sure to verify the information with EPFL's official sources.
Discrete choice models are used extensively in many disciplines where it is important to predict human behavior at a disaggregate level. This course is a follow up of the online course “Introduction t
Discrete choice models are used extensively in many disciplines where it is important to predict human behavior at a disaggregate level. This course is a follow up of the online course “Introduction t
In statistics, linear regression is a linear approach for modelling the relationship between a scalar response and one or more explanatory variables (also known as dependent and independent variables). The case of one explanatory variable is called simple linear regression; for more than one, the process is called multiple linear regression. This term is distinct from multivariate linear regression, where multiple correlated dependent variables are predicted, rather than a single scalar variable.
Regression dilution, also known as regression attenuation, is the biasing of the linear regression slope towards zero (the underestimation of its absolute value), caused by errors in the independent variable. Consider fitting a straight line for the relationship of an outcome variable y to a predictor variable x, and estimating the slope of the line. Statistical variability, measurement error or random noise in the y variable causes uncertainty in the estimated slope, but not bias: on average, the procedure calculates the right slope.
In statistics, econometrics, epidemiology and related disciplines, the method of instrumental variables (IV) is used to estimate causal relationships when controlled experiments are not feasible or when a treatment is not successfully delivered to every unit in a randomized experiment. Intuitively, IVs are used when an explanatory variable of interest is correlated with the error term, in which case ordinary least squares and ANOVA give biased results.
We perform an error analysis of a fully discretised Streamline Upwind Petrov Galerkin Dynamical Low Rank (SUPG-DLR) method for random time-dependent advection-dominated problems. The time integration scheme has a splitting-like nature, allowing for potenti ...
2024
,
This article proposes methods to model non-stationary temporal graph processes motivated by a hospital interaction data set. This corresponds to modelling the observation of edge variables indicating interactions between pairs of nodes exhibiting dependenc ...
OXFORD UNIV PRESS2023
,
Battery health prediction is significant while challenging for intelligent battery management. This article proposes a general framework for both short-term and long-term predictions of battery health under unseen dynamic loading and temperature conditions ...
This lecture is oriented towards the study of audio engineering, with a special focus on room acoustics applications. The learning outcomes will be the techniques for microphones and loudspeaker desig
This course covers statistical methods that are widely used in medicine and biology. A key topic is the analysis of longitudinal data: that is, methods to evaluate exposures, effects and outcomes that