Summary
In statistics, regression validation is the process of deciding whether the numerical results quantifying hypothesized relationships between variables, obtained from regression analysis, are acceptable as descriptions of the data. The validation process can involve analyzing the goodness of fit of the regression, analyzing whether the regression residuals are random, and checking whether the model's predictive performance deteriorates substantially when applied to data that were not used in model estimation. Goodness of fit One measure of goodness of fit is the R2 (coefficient of determination), which in ordinary least squares with an intercept ranges between 0 and 1. However, an R2 close to 1 does not guarantee that the model fits the data well: as Anscombe's quartet shows, a high R2 can occur in the presence of misspecification of the functional form of a relationship or in the presence of outliers that distort the true relationship. One problem with the R2 as a measure of model validity is that it can always be increased by adding more variables into the model, except in the unlikely event that the additional variables are exactly uncorrelated with the dependent variable in the data sample being used. This problem can be avoided by doing an F-test of the statistical significance of the increase in the R2, or by instead using the adjusted R2. residual analysis The residuals from a fitted model are the differences between the responses observed at each combination of values of the explanatory variables and the corresponding prediction of the response computed using the regression function. Mathematically, the definition of the residual for the ith observation in the data set is written with yi denoting the ith response in the data set and xi the vector of explanatory variables, each set at the corresponding values found in the ith observation in the data set. If the model fit to the data were correct, the residuals would approximate the random errors that make the relationship between the explanatory variables and the response variable a statistical relationship.
About this result
This page is automatically generated and may contain information that is not correct, complete, up-to-date, or relevant to your search query. The same applies to every other page on this website. Please make sure to verify the information with EPFL's official sources.
Ontological neighbourhood
Related courses (32)
ME-443: Hydroacoustique pour aménagements hydroélectriques
Introduction aux phénomènes propagatifs dans les circuits hydrauliques, calculs de coups de béliers, comportement transitoire d'aménagements hydroélectriques, simulation numériques 1D du comportement
DH-406: Machine learning for DH
This course aims to introduce the basic principles of machine learning in the context of the digital humanities. We will cover both supervised and unsupervised learning techniques, and study and imple
CS-433: Machine learning
Machine learning methods are becoming increasingly central in many sciences and applications. In this course, fundamental principles and methods of machine learning will be introduced, analyzed and pr
Show more