**Are you an EPFL student looking for a semester project?**

Work with us on data science and visualisation projects, and deploy your project as an app on top of GraphSearch.

Concept# Statistical model validation

Summary

In statistics, model validation is the task of evaluating whether a chosen statistical model is appropriate or not. Oftentimes in statistical inference, inferences from models that appear to fit their data may be flukes, resulting in a misunderstanding by researchers of the actual relevance of their model. To combat this, model validation is used to test whether a statistical model can hold up to permutations in the data. This topic is not to be confused with the closely related task of model selection, the process of discriminating between multiple candidate models: model validation does not concern so much the conceptual design of models as it tests only the consistency between a chosen model and its stated outputs.
There are many ways to validate a model. Residual plots plot the difference between the actual data and the model's predictions: correlations in the residual plots may indicate a flaw in the model. Cross validation is a method of model validation that iteratively refits the model, each time leaving out just a small sample and comparing whether the samples left out are predicted by the model: there are many kinds of cross validation. Predictive simulation is used to compare simulated data to actual data. External validation involves fitting the model to new data. Akaike information criterion estimates the quality of a model.
Model validation comes in many forms and the specific method of model validation a researcher uses is often a constraint of their research design. To emphasize, what this means is that there is no one-size-fits-all method to validating a model. For example, if a researcher is operating with a very limited set of data, but data they have strong prior assumptions about, they may consider validating the fit of their model by using a Bayesian framework and testing the fit of their model using various prior distributions. However, if a researcher has a lot of data and is testing multiple nested models, these conditions may lend themselves toward cross validation and possibly a leave one out test.

Official source

This page is automatically generated and may contain information that is not correct, complete, up-to-date, or relevant to your search query. The same applies to every other page on this website. Please make sure to verify the information with EPFL's official sources.

Related publications (299)

Related people (44)

Related units (11)

Related concepts (11)

Statistical model specification

In statistics, model specification is part of the process of building a statistical model: specification consists of selecting an appropriate functional form for the model and choosing which variables to include. For example, given personal income together with years of schooling and on-the-job experience , we might specify a functional relationship as follows: where is the unexplained error term that is supposed to comprise independent and identically distributed Gaussian variables.

Regression validation

In statistics, regression validation is the process of deciding whether the numerical results quantifying hypothesized relationships between variables, obtained from regression analysis, are acceptable as descriptions of the data. The validation process can involve analyzing the goodness of fit of the regression, analyzing whether the regression residuals are random, and checking whether the model's predictive performance deteriorates substantially when applied to data that were not used in model estimation.

Model selection

Model selection is the task of selecting a model from among various candidates on the basis of performance criterion to choose the best one. In the context of learning, this may be the selection of a statistical model from a set of candidate models, given data. In the simplest cases, a pre-existing set of data is considered. However, the task can also involve the design of experiments such that the data collected is well-suited to the problem of model selection.

Related courses (32)

ME-443: Hydroacoustique pour aménagements hydroélectriques

Introduction aux phénomènes propagatifs dans les circuits hydrauliques, calculs de coups de béliers, comportement transitoire d'aménagements hydroélectriques, simulation numériques 1D du comportement

BIO-645: Introduction to Applied Data Science (I2ADS)

The "Introduction to Applied Data Science" (I2ADS) course is aimed at students of all levels to train them in the core computer science software stack and techniques forming the pillars of open & repr

ENV-524: Hydrological risks and structures

Le cours est une introduction à la théorie des valeurs extrêmes et son utilisation pour la gestion des risques hydrologiques (essentiellement crues). Une ouverture plus large sur la gestion des danger

Related lectures (177)

Linear Regression and Gradient DescentPHYS-231: Data analysis for Physics

Covers linear regression, gradient descent, overfitting, and ridge regression among other concepts.

Bias-Variance Trade-Off

Explores underfitting, overfitting, and the bias-variance trade-off in machine learning models.

Prediction testsMOOC: Introduction to Discrete Choice Models

Explores out-of-sample validation and the methodology of cross-validation for testing predictive models.

Olivier Sauter, Federico Alberto Alfredo Felici, Cassandre Ekta Contré, Anna Teplukhina, Simon Van Mulders, Bernhard Sieglin

We discuss how the combination of experimental observations and rapid modeling has enabled to improve understanding of the tokamak ramp-down phase in ASDEX Upgrade. A series of dedicated experiments has been performed, to disentangle the effect of individu ...

David Atienza Alonso, Alireza Amirshahi, Jonathan Dan, Adriano Bernini, William Cappelletti, Luca Benini, Una Pale

The need for high-quality automated seizure detection algorithms based on electroencephalography (EEG) becomes ever more pressing with the increasing use of ambulatory and long-term EEG monitoring. Heterogeneity in validation methods of these algorithms in ...

2024Lijing Xin, Yan Li, Yubo Zhao, Yan Lin, Wei Ye

Metabolic changes precede malignant histology. However, it remains unclear whether detectable characteristic metabolome exists in esophageal squamous cell carcinoma (ESCC) tissues and biofluids for early diagnosis. Here, we conduct NMR- and MS-based metabo ...