Summary
In statistics, multicollinearity (also collinearity) is a phenomenon in which one predictor variable in a multiple regression model can be linearly predicted from the others with a substantial degree of accuracy. In this situation, the coefficient estimates of the multiple regression may change erratically in response to small changes in the model or the data. Multicollinearity does not reduce the predictive power or reliability of the model as a whole, at least within the sample data set; it only affects calculations regarding individual predictors. That is, a multivariable regression model with collinear predictors can indicate how well the entire bundle of predictors predicts the outcome variable, but it may not give valid results about any individual predictor, or about which predictors are redundant with respect to others. Note that in statements of the assumptions underlying regression analyses such as ordinary least squares, the phrase "no multicollinearity" usually refers to the absence of multicollinearity, which is an exact (non-stochastic) linear relation among the predictors. In such a case, the design matrix has less than full rank, and therefore the moment matrix cannot be inverted. Under these circumstances, for a general linear model , the ordinary least squares estimator does not exist. In any case, multicollinearity is a characteristic of the design matrix, not the underlying statistical model. Multicollinearity leads to non-identifiable parameters. Collinearity is a linear association between explanatory variables. Two variables are perfectly collinear if there is an exact linear relationship between them. For example, and are perfectly collinear if there exist parameters and such that, for all observations , Multicollinearity refers to a situation in which explanatory variables in a multiple regression model are highly linearly related. There is perfect multicollinearity if, for example as in the equation above, the correlation between two independent variables equals 1 or −1.
About this result
This page is automatically generated and may contain information that is not correct, complete, up-to-date, or relevant to your search query. The same applies to every other page on this website. Please make sure to verify the information with EPFL's official sources.
Related courses (7)
MATH-341: Linear models
Regression modelling is a fundamental tool of statistics, because it describes how the law of a random variable of interest may depend on other variables. This course aims to familiarize students with
MATH-412: Statistical machine learning
A course on statistical machine learning for supervised and unsupervised learning
MATH-413: Statistics for data science
Statistics lies at the foundation of data science, providing a unifying theoretical and methodological backbone for the diverse tasks enountered in this emerging field. This course rigorously develops
Show more
Related lectures (32)
Practical Aspects of Gaussian Linear Model
Explores practical aspects of the Gaussian linear model, focusing on variable selection and regularization methods.
Regularization in Machine Learning
Explores Ridge and Lasso Regression for regularization in machine learning models, emphasizing hyperparameter tuning and visualization of parameter coefficients.
Model Selection Methods in Biostatistics
Explores model selection methods in biostatistics, emphasizing the importance of starting with a sensible model.
Show more
Related publications (32)

Acute TNF alpha levels predict cognitive impairment 6-9 months after COVID-19 infection

Dimitri Nestor Alice Van De Ville, Alessandra Griffa, Idris Guessous, Alexandre Cionca

Background: A neurocognitive phenotype of post-COVID-19 infection has recently been described that is characterized by a lack of awareness of memory impairment (i.e., anosognosia), altered functional connectivity in the brain's default mode and limbic netw ...
PERGAMON-ELSEVIER SCIENCE LTD2023
Show more
Related concepts (9)
Linear regression
In statistics, linear regression is a linear approach for modelling the relationship between a scalar response and one or more explanatory variables (also known as dependent and independent variables). The case of one explanatory variable is called simple linear regression; for more than one, the process is called multiple linear regression. This term is distinct from multivariate linear regression, where multiple correlated dependent variables are predicted, rather than a single scalar variable.
Ordinary least squares
In statistics, ordinary least squares (OLS) is a type of linear least squares method for choosing the unknown parameters in a linear regression model (with fixed level-one effects of a linear function of a set of explanatory variables) by the principle of least squares: minimizing the sum of the squares of the differences between the observed dependent variable (values of the variable being observed) in the input dataset and the output of the (linear) function of the independent variable.
Coefficient of determination
In statistics, the coefficient of determination, denoted R2 or r2 and pronounced "R squared", is the proportion of the variation in the dependent variable that is predictable from the independent variable(s). It is a statistic used in the context of statistical models whose main purpose is either the prediction of future outcomes or the testing of hypotheses, on the basis of other related information. It provides a measure of how well observed outcomes are replicated by the model, based on the proportion of total variation of outcomes explained by the model.
Show more