Corrélation (statistiques)En probabilités et en statistique, la corrélation entre plusieurs variables aléatoires ou statistiques est une notion de liaison qui contredit leur indépendance. Cette corrélation est très souvent réduite à la corrélation linéaire entre variables quantitatives, c’est-à-dire l’ajustement d’une variable par rapport à l’autre par une relation affine obtenue par régression linéaire. Pour cela, on calcule un coefficient de corrélation linéaire, quotient de leur covariance par le produit de leurs écarts types.
Explained sum of squaresIn statistics, the explained sum of squares (ESS), alternatively known as the model sum of squares or sum of squares due to regression (SSR – not to be confused with the residual sum of squares (RSS) or sum of squares of errors), is a quantity used in describing how well a model, often a regression model, represents the data being modelled.
Ajustement affinevignette|Nuage de points et sa droite d'ajustement En mathématiques, un ajustement affine est la détermination d’une droite approchant au mieux un nuage de points dans le plan. Il est utilisé notamment en analyse de données pour évaluer la pertinence d’une relation affine entre deux variables statistiques, et pour estimer les coefficients d’une telle relation. Il permet aussi de produire une droite de tendance pour formuler des prévisions sur un comportement futur proche ou une interpolation entre deux mesures effectuées.
Méthode des variables instrumentalesEn statistique et en économétrie, la méthode des variables instrumentales est une méthode permettant d'identifier et d'estimer des relations causales entre des variables. Cette méthode est très souvent utilisée en économétrie. Le modèle de régression linéaire simple fait l'hypothèse que les variables explicatives sont statistiquement indépendantes du terme d'erreur. Par exemple, si on pose le modèle avec x la variable explicative et u le terme d'erreur, on suppose généralement que x est exogène, c'est-à-dire que .
Weighted least squaresWeighted least squares (WLS), also known as weighted linear regression, is a generalization of ordinary least squares and linear regression in which knowledge of the unequal variance of observations (heteroscedasticity) is incorporated into the regression. WLS is also a specialization of generalized least squares, when all the off-diagonal entries of the covariance matrix of the errors, are null.
Shrinkage (statistics)In statistics, shrinkage is the reduction in the effects of sampling variation. In regression analysis, a fitted relationship appears to perform less well on a new data set than on the data set used for fitting. In particular the value of the coefficient of determination 'shrinks'. This idea is complementary to overfitting and, separately, to the standard adjustment made in the coefficient of determination to compensate for the subjunctive effects of further sampling, like controlling for the potential of new explanatory terms improving the model by chance: that is, the adjustment formula itself provides "shrinkage.
MulticollinearityIn statistics, multicollinearity (also collinearity) is a phenomenon in which one predictor variable in a multiple regression model can be linearly predicted from the others with a substantial degree of accuracy. In this situation, the coefficient estimates of the multiple regression may change erratically in response to small changes in the model or the data. Multicollinearity does not reduce the predictive power or reliability of the model as a whole, at least within the sample data set; it only affects calculations regarding individual predictors.
Statistical model specificationIn statistics, model specification is part of the process of building a statistical model: specification consists of selecting an appropriate functional form for the model and choosing which variables to include. For example, given personal income together with years of schooling and on-the-job experience , we might specify a functional relationship as follows: where is the unexplained error term that is supposed to comprise independent and identically distributed Gaussian variables.
Segmented regressionSegmented regression, also known as piecewise regression or broken-stick regression, is a method in regression analysis in which the independent variable is partitioned into intervals and a separate line segment is fit to each interval. Segmented regression analysis can also be performed on multivariate data by partitioning the various independent variables. Segmented regression is useful when the independent variables, clustered into different groups, exhibit different relationships between the variables in these regions.
Omitted-variable biasIn statistics, omitted-variable bias (OVB) occurs when a statistical model leaves out one or more relevant variables. The bias results in the model attributing the effect of the missing variables to those that were included. More specifically, OVB is the bias that appears in the estimates of parameters in a regression analysis, when the assumed specification is incorrect in that it omits an independent variable that is a determinant of the dependent variable and correlated with one or more of the included independent variables.