In statistics, the residual sum of squares (RSS), also known as the sum of squared residuals (SSR) or the sum of squared estimate of errors (SSE), is the sum of the squares of residuals (deviations predicted from actual empirical values of data). It is a measure of the discrepancy between the data and an estimation model, such as a linear regression. A small RSS indicates a tight fit of the model to the data. It is used as an optimality criterion in parameter selection and model selection. In general, total sum of squares = explained sum of squares + residual sum of squares. For a proof of this in the multivariate ordinary least squares (OLS) case, see partitioning in the general OLS model. In a model with a single explanatory variable, RSS is given by: where yi is the ith value of the variable to be predicted, xi is the ith value of the explanatory variable, and is the predicted value of yi (also termed ). In a standard linear simple regression model, , where and are coefficients, y and x are the regressand and the regressor, respectively, and ε is the error term. The sum of squares of residuals is the sum of squares of ; that is where is the estimated value of the constant term and is the estimated value of the slope coefficient . The general regression model with n observations and k explanators, the first of which is a constant unit vector whose coefficient is the regression intercept, is where y is an n × 1 vector of dependent variable observations, each column of the n × k matrix X is a vector of observations on one of the k explanators, is a k × 1 vector of true coefficients, and e is an n× 1 vector of the true underlying errors. The ordinary least squares estimator for is The residual vector ; so the residual sum of squares is: (equivalent to the square of the norm of residuals). In full: where H is the hat matrix, or the projection matrix in linear regression. The least-squares regression line is given by where and , where and Therefore, where The Pearson product-moment correlation is given by therefore

About this result
This page is automatically generated and may contain information that is not correct, complete, up-to-date, or relevant to your search query. The same applies to every other page on this website. Please make sure to verify the information with EPFL's official sources.
Related courses (32)
PHYS-467: Machine learning for physicists
Machine learning and data analysis are becoming increasingly central in sciences including physics. In this course, fundamental principles and methods of machine learning will be introduced and practi
MATH-413: Statistics for data science
Statistics lies at the foundation of data science, providing a unifying theoretical and methodological backbone for the diverse tasks enountered in this emerging field. This course rigorously develops
CS-411: Digital education
This course addresses the relationship between specific technological features and the learners' cognitive processes. It also covers the methods and results of empirical studies: do student actually l
Show more
Related lectures (120)
Linear Regression
Covers linear regression for estimating train speed using least squares and regularization.
Linear Regression: Regularization
Covers linear regression, regularization, and probabilistic models in generating labels.
Linear Regression: Basics and Estimation
Covers the basics of linear regression and how to solve estimation problems using least squares and matrix notation.
Show more
Related publications (210)
Related concepts (17)
Linear regression
In statistics, linear regression is a linear approach for modelling the relationship between a scalar response and one or more explanatory variables (also known as dependent and independent variables). The case of one explanatory variable is called simple linear regression; for more than one, the process is called multiple linear regression. This term is distinct from multivariate linear regression, where multiple correlated dependent variables are predicted, rather than a single scalar variable.
Mean squared error
In statistics, the mean squared error (MSE) or mean squared deviation (MSD) of an estimator (of a procedure for estimating an unobserved quantity) measures the average of the squares of the errors—that is, the average squared difference between the estimated values and the actual value. MSE is a risk function, corresponding to the expected value of the squared error loss. The fact that MSE is almost always strictly positive (and not zero) is because of randomness or because the estimator does not account for information that could produce a more accurate estimate.
Errors and residuals
In statistics and optimization, errors and residuals are two closely related and easily confused measures of the deviation of an observed value of an element of a statistical sample from its "true value" (not necessarily observable). The error of an observation is the deviation of the observed value from the true value of a quantity of interest (for example, a population mean). The residual is the difference between the observed value and the estimated value of the quantity of interest (for example, a sample mean).
Show more

Graph Chatbot

Chat with Graph Search

Ask any question about EPFL courses, lectures, exercises, research, news, etc. or try the example questions below.

DISCLAIMER: The Graph Chatbot is not programmed to provide explicit or categorical answers to your questions. Rather, it transforms your questions into API requests that are distributed across the various IT services officially administered by EPFL. Its purpose is solely to collect and recommend relevant references to content that you can explore to help you answer your questions.