Résumé
In statistics and in particular in regression analysis, leverage is a measure of how far away the independent variable values of an observation are from those of the other observations. High-leverage points, if any, are outliers with respect to the independent variables. That is, high-leverage points have no neighboring points in space, where is the number of independent variables in a regression model. This makes the fitted model likely to pass close to a high leverage observation. Hence high-leverage points have the potential to cause large changes in the parameter estimates when they are deleted i.e., to be influential points. Although an influential point will typically have high leverage, a high leverage point is not necessarily an influential point. The leverage is typically defined as the diagonal elements of the hat matrix. Consider the linear regression model , . That is, , where, is the design matrix whose rows correspond to the observations and whose columns correspond to the independent or explanatory variables. The leverage score for the independent observation is given as: the diagonal element of the ortho-projection matrix (a.k.a hat matrix) . Thus the leverage score can be viewed as the 'weighted' distance between to the mean of 's (see its relation with Mahalanobis distance). It can also be interpreted as the degree by which the measured (dependent) value (i.e., ) influences the fitted (predicted) value (i.e., ): mathematically, Hence, the leverage score is also known as the observation self-sensitivity or self-influence. Using the fact that (i.e., the prediction is ortho-projection of onto range space of ) in the above expression, we get . Note that this leverage depends on the values of the explanatory variables of all observations but not on any of the values of the dependent variables . The leverage is a number between 0 and 1, Proof: Note that is idempotent matrix () and symmetric (). Thus, by using the fact that , we have . Since we know that , we have .
À propos de ce résultat
Cette page est générée automatiquement et peut contenir des informations qui ne sont pas correctes, complètes, à jour ou pertinentes par rapport à votre recherche. Il en va de même pour toutes les autres pages de ce site. Veillez à vérifier les informations auprès des sources officielles de l'EPFL.
Cours associés (6)
MATH-341: Linear models
Regression modelling is a fundamental tool of statistics, because it describes how the law of a random variable of interest may depend on other variables. This course aims to familiarize students with
MATH-413: Statistics for data science
Statistics lies at the foundation of data science, providing a unifying theoretical and methodological backbone for the diverse tasks enountered in this emerging field. This course rigorously develops
FIN-403: Econometrics
The course covers basic econometric models and methods that are routinely applied to obtain inference results in economic and financial applications.
Afficher plus
Séances de cours associées (32)
Analyse de régression: Sélection du modèle et outils de diagnostic
Explore les outils d'analyse de régression, la sélection des modèles, les points influents, les valeurs aberrantes et les graphiques de diagnostic.
Diagnostics du modèle : Observations aberrantes, de levier et d'influence
Explore les observations aberrantes, les effets de levier et les influences dans les modèles statistiques, y compris les méthodes de détection et d'évaluation.
Vérification du modèle et des résidus
Explore la vérification du modèle et les résidus dans lanalyse de régression, en soulignant limportance des diagnostics pour assurer la validité du modèle.
Afficher plus
Publications associées (35)
Personnes associées (2)
Concepts associés (7)
Projection matrix
In statistics, the projection matrix , sometimes also called the influence matrix or hat matrix , maps the vector of response values (dependent variable values) to the vector of fitted values (or predicted values). It describes the influence each response value has on each fitted value. The diagonal elements of the projection matrix are the leverages, which describe the influence each response value has on the fitted value for that same observation.
Influential observation
In statistics, an influential observation is an observation for a statistical calculation whose deletion from the dataset would noticeably change the result of the calculation. In particular, in regression analysis an influential observation is one whose deletion has a large effect on the parameter estimates. Various methods have been proposed for measuring influence. Assume an estimated regression , where is an n×1 column vector for the response variable, is the n×k design matrix of explanatory variables (including a constant), is the n×1 residual vector, and is a k×1 vector of estimates of some population parameter .
Distance de Cook
En statistique, la distance de Cook est couramment utilisée pour estimer l'influence d'une donnée lors de l'utilisation de méthodes des moindres carrés. Dans le cas général, de l'utilisation de la méthode des moindres carrés, la distance de Cook peut être utilisée de plusieurs façons : pour indiquer les données qu'il serait intéressant de vérifier; pour indiquer les régions de l'espace de conception où il serait bon d'être en mesure d'obtenir plus de points de données. Ce nom vient du statisticien américain R.
Afficher plus