**Are you an EPFL student looking for a semester project?**

Work with us on data science and visualisation projects, and deploy your project as an app on top of GraphSearch.

Concept# Model selection

Summary

Model selection is the task of selecting a model from among various candidates on the basis of performance criterion to choose the best one.
In the context of learning, this may be the selection of a statistical model from a set of candidate models, given data. In the simplest cases, a pre-existing set of data is considered. However, the task can also involve the design of experiments such that the data collected is well-suited to the problem of model selection. Given candidate models of similar predictive or explanatory power, the simplest model is most likely to be the best choice (Occam's razor).
state, "The majority of the problems in statistical inference can be considered to be problems related to statistical modeling". Relatedly, has said, "How [the] translation from subject-matter problem to statistical model is done is often the most critical part of an analysis".
Model selection may also refer to the problem of selecting a few representative models from a large set of computational models for the purpose of decision making or optimization under uncertainty.
In machine learning, algorithmic approaches to model selection include feature selection, hyperparameter optimization, and statistical learning theory.
In its most basic forms, model selection is one of the fundamental tasks of scientific inquiry. Determining the principle that explains a series of observations is often linked directly to a mathematical model predicting those observations. For example, when Galileo performed his inclined plane experiments, he demonstrated that the motion of the balls fitted the parabola predicted by his model .
Of the countless number of possible mechanisms and processes that could have produced the data, how can one even begin to choose the best model? The mathematical approach commonly taken decides among a set of candidate models; this set must be chosen by the researcher. Often simple models such as polynomials are used, at least initially .

Official source

This page is automatically generated and may contain information that is not correct, complete, up-to-date, or relevant to your search query. The same applies to every other page on this website. Please make sure to verify the information with EPFL's official sources.

Related publications (3)

Related people (6)

Related concepts (34)

Related courses (33)

Related lectures (235)

CS-423: Distributed information systems

This course introduces the key concepts and algorithms from the areas of information retrieval, data mining and knowledge bases, which constitute the foundations of today's Web-based distributed infor

EE-512: Applied biomedical signal processing

The goal of this course is twofold: (1) to introduce physiological basis, signal acquisition solutions (sensors) and state-of-the-art signal processing techniques, and (2) to propose concrete examples

MATH-516: Applied statistics

The course will provide an overview of everyday challenges in applied statistics through case studies. Students will learn how to use core statistical methods and their extensions, and will use comput

Statistics, like all mathematical disciplines, does not infer valid conclusions from nothing. Inferring interesting conclusions about real statistical populations almost always requires some background assumptions. Those assumptions must be made carefully, because incorrect assumptions can generate wildly inaccurate conclusions. Here are some examples of statistical assumptions: Independence of observations from each other (this assumption is an especially common error). Independence of observational error from potential confounding effects.

The deviance information criterion (DIC) is a hierarchical modeling generalization of the Akaike information criterion (AIC). It is particularly useful in Bayesian model selection problems where the posterior distributions of the models have been obtained by Markov chain Monte Carlo (MCMC) simulation. DIC is an asymptotic approximation as the sample size becomes large, like AIC. It is only valid when the posterior distribution is approximately multivariate normal.

In statistics, model validation is the task of evaluating whether a chosen statistical model is appropriate or not. Oftentimes in statistical inference, inferences from models that appear to fit their data may be flukes, resulting in a misunderstanding by researchers of the actual relevance of their model. To combat this, model validation is used to test whether a statistical model can hold up to permutations in the data.

Statistical Inference: Model Selection and Nuisance Parameters

Covers model selection, nuisance parameters, and higher-order inference methods in statistical inference.

Model Selection

Covers model selection, emphasizing the tradeoff between complexity and error in data analysis.

Discrete choice and machine learning: two methodologies

Delves into the complementary methodologies of discrete choice and machine learning, covering notations, variables, models, data processes, extrapolation, what-if analysis, and more.

Jean-Philippe Thiran, Finnian Paul Kelly

Many speech technology systems rely on Gaussian Mixture Models (GMMs). The need for a comparison between two GMMs arises in applications such as speaker verification, model selection or parameter esti

,

Model specification is an integral part of any statistical inference problem. Several model selection techniques have been developed in order to determine which model is the best one among a list of p