**Are you an EPFL student looking for a semester project?**

Work with us on data science and visualisation projects, and deploy your project as an app on top of GraphSearch.

Concept# Statistical model

Summary

A statistical model is a mathematical model that embodies a set of statistical assumptions concerning the generation of sample data (and similar data from a larger population). A statistical model represents, often in considerably idealized form, the data-generating process. When referring specifically to probabilities, the corresponding term is probabilistic model.
A statistical model is usually specified as a mathematical relationship between one or more random variables and other non-random variables. As such, a statistical model is "a formal representation of a theory" (Herman Adèr quoting Kenneth Bollen).
All statistical hypothesis tests and all statistical estimators are derived via statistical models. More generally, statistical models are part of the foundation of statistical inference.
Introduction
Informally, a statistical model can be thought of as a statistical assumption (or set of statistical assumptions) with a certain property: that the assumption allows u

Official source

This page is automatically generated and may contain information that is not correct, complete, up-to-date, or relevant to your search query. The same applies to every other page on this website. Please make sure to verify the information with EPFL's official sources.

Related publications

Loading

Related people

Loading

Related units

Loading

Related concepts

Loading

Related courses

Loading

Related lectures

Loading

Related people (21)

Related publications (100)

Loading

Loading

Loading

Related units (15)

Related courses (66)

EE-556: Mathematics of data: from theory to computation

This course provides an overview of key advances in continuous optimization and statistical analysis for machine learning. We review recent learning formulations and models as well as their guarantees, describe scalable solution techniques and algorithms, and illustrate the trade-offs involved.

MATH-234(d): Probability and statistics

Ce cours enseigne les notions élémentaires de la théorie de probabilité et de la statistique, tels que l'inférence, les tests et la régression.

MATH-442: Statistical theory

The course aims at developing certain key aspects of the theory of statistics, providing a common general framework for statistical methodology. While the main emphasis will be on the mathematical aspects of statistics, an effort will be made to balance rigor and intuition.

xtreme value analysis is concerned with the modelling of extreme events such as floods and heatwaves, which can have large impacts. Statistical modelling can be useful to better assess risks even if, due to scarcity of measurements, there is inherently very large residual uncertainty in any analysis. Driven by the increase in environmental databases, spatial modelling of extremes has expanded rapidly in the last decade. This thesis presents contributions to such analysis.
The first chapter is about likelihood-based inference in the univariate setting and investigates the use of bias-correction and higher-order asymptotic methods for extremes, highlighting through examples and illustrations the unique challenge posed by data scarcity. We focus on parametric modelling of extreme values, which relies on limiting distributional results and for which, as a result, uncertainty quantification is complicated. We find that, in certain cases, small-sample asymptotic methods can give improved inference by reducing the error rate of confidence intervals. Two data illustrations, linked to assessment of the frequency of extreme rainfall episodes in Venezuela and the analysis of survival of supercentenarians, illustrate the methods developed.
In the second chapter, we review the major methods for the analysis of spatial extremes models. We highlight the similarities and provide a thorough literature review along with novel simulation algorithms. The methods described therein are made available through a statistical software package.
The last chapter focuses on estimation for a Bayesian hierarchical model derived from a multivariate generalized Pareto process. We review approaches for the estimation of censored components in models derived from (log)-elliptical distributions, paying particular attention to the estimation of a high-dimensional Gaussian distribution function via Monte Carlo methods. The impacts of model misspecification and of censoring are explored through extensive simulations and we conclude with a case study of rainfall extremes in Eastern Switzerland.

Related concepts (61)

Normal distribution

In statistics, a normal distribution or Gaussian distribution is a type of continuous probability distribution for a real-valued random variable. The general form of its probability density function

Statistics

Statistics (from German: Statistik, "description of a state, a country") is the discipline that concerns the collection, organization, analysis, interpretation, and present

Statistical inference

Statistical inference is the process of using data analysis to infer properties of an underlying distribution of probability. Inferential statistical analysis infers properties of a population, for ex

Multiple generalized additive models are a class of statistical regression models wherein parameters of probability distributions incorporate information through additive smooth functions of predictors. The functions are represented by basis function expansions, whose coefficients are the regression parameters. The smoothness is induced by a quadratic roughness penalty on the functionsâ curvature, which is equivalent to a weighted $L_2$ regularization controlled by smoothing parameters. Regression fitting relies on maximum penalized likelihood estimation for the regression coefficients, and smoothness selection relies on maximum marginal likelihood estimation for the smoothing parameters.
Owing to their nonlinearity, flexibility and interpretability, generalized additive models are widely used in statistical modeling, but despite recent advances, reliable and fast methods for automatic smoothing in massive datasets are unavailable. Existing approaches are either reliable, complex and slow, or unreliable, simpler and fast, so a compromise must be made. A bridge between these categories is needed to extend use of multiple generalized additive models to settings beyond those possible in existing software. This thesis is one step in this direction. We adopt the marginal likelihood approach to develop approximate expectation-maximization methods for automatic smoothing, which avoid evaluation of expensive and unstable terms. This results in simpler algorithms that do not sacrifice reliability and achieve state-of-the-art accuracy and computational efficiency.
We extend the proposed approach to big-data settings and produce the first reliable, high-performance and distributed-memory algorithm for fitting massive multiple generalized additive models. Furthermore, we develop the underlying generic software libraries and make them accessible to the open-source community.

Related lectures (161)