In statistics, generalized least squares (GLS) is a method used to estimate the unknown parameters in a linear regression model when there is a certain degree of correlation between the residuals in the regression model. Least squares and weighted least squares may need to be more statistically efficient and prevent misleading inferences. GLS was first described by Alexander Aitken in 1935.
In standard linear regression models one observes data on n statistical units. The response values are placed in a vector , and the predictor values are placed in the design matrix , where is a vector of the k predictor variables (including a constant) for the ith unit. The model forces the conditional mean of given to be a linear function of and assumes the conditional variance of the error term given is a known nonsingular covariance matrix . This is usually written as
Here is a vector of unknown constants (known as “regression coefficients”) that must be estimated from the data.
Suppose is a candidate estimate for . Then the residual vector for will be . The generalized least squares method estimates by minimizing the squared Mahalanobis length of this residual vector:
where the last two terms evaluate to scalars, resulting in
This objective is a quadratic form in .
Taking the gradient of this quadratic form with respect to and equating it to zero (when ) gives
Therefore, the minimum of the objective function can be computed yielding the explicit formula:
The quantity is known as the precision matrix (or dispersion matrix), a generalization of the diagonal weight matrix.
The GLS estimator is unbiased, consistent, efficient, and asymptotically normal with and . GLS is equivalent to applying ordinary least squares to a linearly transformed version of the data. To see this, factor , for instance using the Cholesky decomposition. Then if one pre-multiplies both sides of the equation by , we get an equivalent linear model where , , and . In this model , where is the identity matrix.
This page is automatically generated and may contain information that is not correct, complete, up-to-date, or relevant to your search query. The same applies to every other page on this website. Please make sure to verify the information with EPFL's official sources.
In statistics, linear regression is a linear approach for modelling the relationship between a scalar response and one or more explanatory variables (also known as dependent and independent variables). The case of one explanatory variable is called simple linear regression; for more than one, the process is called multiple linear regression. This term is distinct from multivariate linear regression, where multiple correlated dependent variables are predicted, rather than a single scalar variable.
In statistics, a sequence (or a vector) of random variables is homoscedastic (ˌhoʊmoʊskəˈdæstɪk) if all its random variables have the same finite variance; this is also known as homogeneity of variance. The complementary notion is called heteroscedasticity, also known as heterogeneity of variance. The spellings homoskedasticity and heteroskedasticity are also frequently used.
In statistics, the Gauss–Markov theorem (or simply Gauss theorem for some authors) states that the ordinary least squares (OLS) estimator has the lowest sampling variance within the class of linear unbiased estimators, if the errors in the linear regression model are uncorrelated, have equal variances and expectation value of zero. The errors do not need to be normal, nor do they need to be independent and identically distributed (only uncorrelated with mean zero and homoscedastic with finite variance).
Machine learning and data analysis are becoming increasingly central in sciences including physics. In this course, fundamental principles and methods of machine learning will be introduced and practi
Introduces density matrix and von Neumann entropy in quantum systems.
Explores heteroskedasticity in econometrics, discussing its impact on standard errors, alternative estimators, testing methods, and implications for hypothesis testing.
Explores heteroskedasticity and autocorrelation in econometrics, covering implications, applications, testing methods, and hypothesis testing consequences.
We introduce Neural Network (NN for short) approximation architectures for the numerical solution of Boundary Integral Equations (BIEs for short). We exemplify the proposed NN approach for the boundary reduction of the potential problem in two spatial dime ...
SPRINGER/PLENUM PUBLISHERS2023
In this thesis we study stability from several viewpoints. After covering the practical importance, the rich history and the ever-growing list of manifestations of stability, we study the following. (i) (Statistical identification of stable dynamical syste ...
EPFL2024
, , , ,
The theory underlying robust distributed learning algorithms, designed to resist adversarial machines, matches empirical observations when data is homogeneous. Under data heterogeneity however, which is the norm in practical scenarios, established lower bo ...