In statistics, the projection matrix , sometimes also called the influence matrix or hat matrix , maps the vector of response values (dependent variable values) to the vector of fitted values (or predicted values). It describes the influence each response value has on each fitted value. The diagonal elements of the projection matrix are the leverages, which describe the influence each response value has on the fitted value for that same observation. If the vector of response values is denoted by and the vector of fitted values by , As is usually pronounced "y-hat", the projection matrix is also named hat matrix as it "puts a hat on ". The element in the ith row and jth column of is equal to the covariance between the jth response value and the ith fitted value, divided by the variance of the former: The formula for the vector of residuals can also be expressed compactly using the projection matrix: where is the identity matrix. The matrix is sometimes referred to as the residual maker matrix or the annihilator matrix. The covariance matrix of the residuals , by error propagation, equals where is the covariance matrix of the error vector (and by extension, the response vector as well). For the case of linear models with independent and identically distributed errors in which , this reduces to: From the figure, it is clear that the closest point from the vector onto the column space of , is , and is one where we can draw a line orthogonal to the column space of . A vector that is orthogonal to the column space of a matrix is in the nullspace of the matrix transpose, so From there, one rearranges, so Therefore, since is on the column space of , the projection matrix, which maps onto is just , or Suppose that we wish to estimate a linear model using linear least squares. The model can be written as where is a matrix of explanatory variables (the design matrix), β is a vector of unknown parameters to be estimated, and ε is the error vector. Many types of models and techniques are subject to this formulation.

About this result
This page is automatically generated and may contain information that is not correct, complete, up-to-date, or relevant to your search query. The same applies to every other page on this website. Please make sure to verify the information with EPFL's official sources.
Related courses (13)
MATH-413: Statistics for data science
Statistics lies at the foundation of data science, providing a unifying theoretical and methodological backbone for the diverse tasks enountered in this emerging field. This course rigorously develops
CS-233: Introduction to machine learning
Machine learning and data analysis are becoming increasingly central in many sciences and applications. In this course, fundamental principles and methods of machine learning will be introduced, analy
CS-433: Machine learning
Machine learning methods are becoming increasingly central in many sciences and applications. In this course, fundamental principles and methods of machine learning will be introduced, analyzed and pr
Show more
Related lectures (34)
Plackett-Burman Design: Hadamard Matrix
Explores the construction and application of Hadamard matrices for efficient estimation of main effects without interactions in the Plackett-Burman Design.
Harmonic Signals and Spectrum Estimation
Explores harmonic signals, spectrum estimation, and signal analysis methods using MATLAB tools.
Regression Methods: Model Building and Diagnostics
Explores regression methods, covering model building, diagnostics, inference, and analysis of variance.
Show more
Related publications (35)

Outlier-free spline spaces for isogeometric discretizations of biharmonic and polyharmonic eigenvalue problems

Espen Sande

We present outlier-free isogeometric Galerkin discretizations of eigenvalue problems related to the biharmonic and the polyharmonic operator in the univariate setting. These are Galerkin discretizations in certain spline subspaces that provide accurate app ...
Lausanne2023

Application of optimal spline subspaces for the removal of spurious outliers in isogeometric discretizations

Espen Sande

We show that isogeometric Galerkin discretizations of eigenvalue problems related to the Laplace operator subject to any standard type of homogeneous boundary conditions have no outliers in certain optimal spline subspaces. Roughly speaking, these optimal ...
ELSEVIER SCIENCE SA2022

Leverage Point Identification Method for LAV-Based State Estimation

Mario Paolone, Guglielmo Frigo, Ali Abur, Mathias Dorier

In this paper we enunciate and rigorously demonstrate a new lemma that, based on a previously proposed theorem, proves the identifiability of leverage points in state estimation with specific reference to the Least Absolute Value (LAV) estimator. In this c ...
2021
Show more
Related concepts (8)
Leverage (statistics)
In statistics and in particular in regression analysis, leverage is a measure of how far away the independent variable values of an observation are from those of the other observations. High-leverage points, if any, are outliers with respect to the independent variables. That is, high-leverage points have no neighboring points in space, where is the number of independent variables in a regression model. This makes the fitted model likely to pass close to a high leverage observation.
Cook's distance
In statistics, Cook's distance or Cook's D is a commonly used estimate of the influence of a data point when performing a least-squares regression analysis. In a practical ordinary least squares analysis, Cook's distance can be used in several ways: to indicate influential data points that are particularly worth checking for validity; or to indicate regions of the design space where it would be good to be able to obtain more data points. It is named after the American statistician R.
Studentized residual
In statistics, a studentized residual is the quotient resulting from the division of a residual by an estimate of its standard deviation. It is a form of a Student's t-statistic, with the estimate of error varying between points. This is an important technique in the detection of outliers. It is among several named in honor of William Sealey Gosset, who wrote under the pseudonym Student. Dividing a statistic by a sample standard deviation is called studentizing, in analogy with standardizing and normalizing.
Show more

Graph Chatbot

Chat with Graph Search

Ask any question about EPFL courses, lectures, exercises, research, news, etc. or try the example questions below.

DISCLAIMER: The Graph Chatbot is not programmed to provide explicit or categorical answers to your questions. Rather, it transforms your questions into API requests that are distributed across the various IT services officially administered by EPFL. Its purpose is solely to collect and recommend relevant references to content that you can explore to help you answer your questions.