**Are you an EPFL student looking for a semester project?**

Work with us on data science and visualisation projects, and deploy your project as an app on top of GraphSearch.

Concept# Model selection

Summary

Model selection is the task of selecting a model from among various candidates on the basis of performance criterion to choose the best one.
In the context of learning, this may be the selection of a statistical model from a set of candidate models, given data. In the simplest cases, a pre-existing set of data is considered. However, the task can also involve the design of experiments such that the data collected is well-suited to the problem of model selection. Given candidate models of similar predictive or explanatory power, the simplest model is most likely to be the best choice (Occam's razor).
state, "The majority of the problems in statistical inference can be considered to be problems related to statistical modeling". Relatedly, has said, "How [the] translation from subject-matter problem to statistical model is done is often the most critical part of an analysis".
Model selection may also refer to the problem of selecting a few representative models from a large set of computational models for the purpose of decision making or optimization under uncertainty.
In machine learning, algorithmic approaches to model selection include feature selection, hyperparameter optimization, and statistical learning theory.
In its most basic forms, model selection is one of the fundamental tasks of scientific inquiry. Determining the principle that explains a series of observations is often linked directly to a mathematical model predicting those observations. For example, when Galileo performed his inclined plane experiments, he demonstrated that the motion of the balls fitted the parabola predicted by his model .
Of the countless number of possible mechanisms and processes that could have produced the data, how can one even begin to choose the best model? The mathematical approach commonly taken decides among a set of candidate models; this set must be chosen by the researcher. Often simple models such as polynomials are used, at least initially .

Official source

This page is automatically generated and may contain information that is not correct, complete, up-to-date, or relevant to your search query. The same applies to every other page on this website. Please make sure to verify the information with EPFL's official sources.

Related publications (271)

Related concepts (23)

Related courses (32)

Related units (13)

Bayesian information criterion

In statistics, the Bayesian information criterion (BIC) or Schwarz information criterion (also SIC, SBC, SBIC) is a criterion for model selection among a finite set of models; models with lower BIC are generally preferred. It is based, in part, on the likelihood function and it is closely related to the Akaike information criterion (AIC). When fitting models, it is possible to increase the maximum likelihood by adding parameters, but doing so may result in overfitting.

Goodness of fit

The goodness of fit of a statistical model describes how well it fits a set of observations. Measures of goodness of fit typically summarize the discrepancy between observed values and the values expected under the model in question. Such measures can be used in statistical hypothesis testing, e.g. to test for normality of residuals, to test whether two samples are drawn from identical distributions (see Kolmogorov–Smirnov test), or whether outcome frequencies follow a specified distribution (see Pearson's chi-square test).

Statistical model validation

In statistics, model validation is the task of evaluating whether a chosen statistical model is appropriate or not. Oftentimes in statistical inference, inferences from models that appear to fit their data may be flukes, resulting in a misunderstanding by researchers of the actual relevance of their model. To combat this, model validation is used to test whether a statistical model can hold up to permutations in the data.

EE-512: Applied biomedical signal processing

The goal of this course is twofold: (1) to introduce physiological basis, signal acquisition solutions (sensors) and state-of-the-art signal processing techniques, and (2) to propose concrete examples

MATH-516: Applied statistics

The course will provide an overview of everyday challenges in applied statistics through case studies. Students will learn how to use core statistical methods and their extensions, and will use comput

BIO-322: Introduction to machine learning for bioengineers

Students understand basic concepts and methods of machine learning. They can describe them in mathematical terms and can apply them to data using a high-level programming language (julia/python/R).

Related people (37)

The technological advancements of the past decades have allowed transforming an increasing part of our daily actions and decisions into storable data, leading to a radical change in the scale and scope of available data in relation to virtually any object ...

Supervised machine learning models are receiving increasing attention in electricity theft detection due to their high detection accuracy. However, their performance depends on a massive amount of labeled training data, which comes from time-consuming and ...

Related lectures (139)

We conduct two survey experiments to study which information people choose to consume and how it affects their beliefs. In the first experiment, respondents choose between optimistic and pessimistic article headlines related to the COVID-19 pandemic and ar ...

2024Statistical Inference: Model Selection and Nuisance ParametersMATH-562: Statistical inference

Covers model selection, nuisance parameters, and higher-order inference methods in statistical inference.

Model SelectionMSE-213: Probability and statistics for materials science

Covers model selection, emphasizing the tradeoff between complexity and error in data analysis.

Discrete choice and machine learning: two methodologiesMOOC: Selected Topics on Discrete Choice

Delves into the complementary methodologies of discrete choice and machine learning, covering notations, variables, models, data processes, extrapolation, what-if analysis, and more.