**Are you an EPFL student looking for a semester project?**

Work with us on data science and visualisation projects, and deploy your project as an app on top of GraphSearch.

Concept# Statistics

Summary

Statistics (from German: Statistik, "description of a state, a country") is the discipline that concerns the collection, organization, analysis, interpretation, and presentation of data. In applying statistics to a scientific, industrial, or social problem, it is conventional to begin with a statistical population or a statistical model to be studied. Populations can be diverse groups of people or objects such as "all people living in a country" or "every atom composing a crystal". Statistics deals with every aspect of data, including the planning of data collection in terms of the design of surveys and experiments.
When census data cannot be collected, statisticians collect data by developing specific experiment designs and survey samples. Representative sampling assures that inferences and conclusions can reasonably extend from the sample to the population as a whole. An experimental study involves taking measurements of the system under study, manipulati

Official source

This page is automatically generated and may contain information that is not correct, complete, up-to-date, or relevant to your search query. The same applies to every other page on this website. Please make sure to verify the information with EPFL's official sources.

Related publications

Loading

Related people

Loading

Related units

Loading

Related concepts

Loading

Related courses

Loading

Related lectures

Loading

Related publications (100)

Loading

Loading

Loading

Related people (129)

Related units (71)

Related concepts (516)

Normal distribution

In statistics, a normal distribution or Gaussian distribution is a type of continuous probability distribution for a real-valued random variable. The general form of its probability density function

Probability distribution

In probability theory and statistics, a probability distribution is the mathematical function that gives the probabilities of occurrence of different possible outcomes for an experiment. It is a mat

Variance

In probability theory and statistics, variance is the squared deviation from the mean of a random variable. The variance is also often defined as the square of the standard deviation. Variance is a

Related courses (240)

MATH-131: Probability and statistics

Le cours présente les notions de base de la théorie des probabilités et de l'inférence statistique. L'accent est mis sur les concepts principaux ainsi que les méthodes les plus utilisées.

ENV-400: Air pollution and climate change

A survey course describing the origins of air pollution and climate change

PHYS-105: Advanced physics II (thermodynamics)

Ce cours présente la thermodynamique en tant que théorie permettant une description d'un grand nombre de phénomènes importants en physique, chimie et ingéniere, et d'effets de transport. Une introduction à la physique statistique renforce les notions acquises grâce à une modélisation microscopique.

Related lectures (550)

During the last twenty years, Random matrix theory (RMT) has produced numerous results that allow a better understanding of large random matrices. These advances have enabled interesting applications in the domain of communication. Although this theory can contribute to many other domains such as brain imaging or genetic research, its has been rarely applied. The main barrier to the adoption of RMT may be the lack of concrete statistical results from probabilistic Random matrix theory. Indeed, direct generalisation of classical multivariate theory to high dimensional assumptions is often difficult and the proposed procedures often assume strong hypotheses on the data matrix such as normality or overly restrictive independence conditions on the data.
This thesis proposes a statistical procedure for testing the equality of two independent estimated covariance matrices when the number of potentially dependent data vectors is large and proportional to the size of the vectors corresponding to the number of observed variables. Although the existing theory builds a very good intuition of the behaviour of these matrices, it does not provide enough results to build a satisfactory test for both the power and the robustness. Hence, inspired by spike models, we define the residual spikes and prove many theorems describing the behaviour of many statistics using eigenvectors and eigenvalues in very general cases. For example in the two central theorems of this thesis, the Invariant Angle Theorem and the Invariant Dot Product Theorem.
Using numerous generalisations of the theory, this thesis finally proposes a description of the behaviour of a statistic under a null hypothesis. This statistic allows the user to test the equality of two populations, but also other null hypotheses such as the independence of two sets of variables. Finally, the robustness of the procedure is demonstrated for different classes of models and criteria for evaluating robustness are proposed to the reader.
Therefore, the major contribution of this thesis is to propose a methodology both easy to apply and having good properties. Secondly, a large number of theoretical results are demonstrated and could be easily used to build other applications.

The present work addresses the problematic of forecasting impacts of climate change on future rainfall regimes and their consequences on urban stormwater infrastructures. Researches carried out allowed to develop an integrated framework for producing high resolution probabilistic rainfall projections suitable for studying hydrological processes at the scale of urban drainage. Downscaling is at the core of the methodology as the predictions of the numerical General Circulation Models (GCMs) employed by the climate scientific community to model climate evolution are too coarse for hydrological impact studies. The proposed downscaling approach respects the scales of the physical processes characterizing precipitations and consists in three steps: i) Daily rainfall series at the location of interest are downscaled from coarse-gridded monthly GCMs projections (scale of weather events); ii) The generated daily series are further downscaled to the hourly time-step (scale of storms dynamics); iii) Finally, hourly series are disaggregated to sub-hourly level (scale of raincells). Daily downscaling is achieved by a statistical procedure, based on Generalized Linear Models (GLMs), seeking to relate large-scale atmospheric variables, corresponding to the scale of GCMs, to local daily rainfall series. The proposed methodology is assessed using three contrasted situations in Switzerland (Geneva, Sion and Säntis) and is shown to perform well in reproducing historical rainfall statistics (including extremes and inter-annual variability) in the present-day climate; furthermore, projections were shown to be consistent with the simulations of physically-based dynamical models (i.e. Regional Climate Models). Projections for the second part of the 21th century indicate considerably drier summers, but no significant tendency toward more extreme events was detected except for Säntis. Finally, extensions of the methodology were presented allowing to downscale other atmospheric variables than rainfall. Sub-daily rainfall downscaling is achieved using a stochastic hourly rainfall generator based on Poisson clusters model which aims at conceptualizing storm dynamics in a simple way. To provide sensible results such generators have to be fitted on historical rainfall statistics computed at different levels of temporal aggregation. In the present context, this raises a fundamental problem as the required fitting statistics at the sub-daily time-scale are not available for the future. Shortcomings of existing methods led us to develop a novel approach based on Multivariate Adaptive Regression Splines (MARS) which were so far seldom used in hydrology. The proposed MARS models are conditioned on climate and fit thus particularly well in the general downscaling framework. In addition, atmospheric predictors allow to account naturally for seasonal variations meaning that a single MARS model holds for the whole year, whereas existing models are specific to each month of the year and are therefore not robust against the seasonal changes that might induce global warming. The methodology was applied to generate hourly rainfall series from daily data simulated by the GLMs at Geneva for the end of the 21th century. Climate change was found to impact significantly summer storm dynamics: raincells are predicted to be shorter but more intense, and storms are projected to be less frequent. A frequency analysis made on the simulated hourly rainfall series revealed a significant increase of hourly rainfall return levels. Hourly rainfall series are further disaggregated to the 10-minute level using a cascade-based model. Using case studies in the Geneva area, the performances of this sub-hourly rainfall disagreggator (in particular the reproduction of extreme values) were shown to be equivalent when fitted on statistics derived from the temporal levels 10-minute to 1-hour (sub-hourly fitting set), or from the 1-hour to 3-hour levels (supra-hourly fitting set). In consequence, the supra-hourly statistics of the hourly rainfall series generated by the stochastic Poisson clusters model can be used to fit the disagreggator model in order to simulate 10-minute rainfall series. Projections of 10-minute rainfall at Geneva for the end of the 21th century indicate an increase of extreme events intensities. Uncertainties in the proposed downscaling procedure are dealt with using, whenever feasible, probabilistic models (i.e. GLMs, hourly rainfall generator, MARS model and sub-hourly disagreggator), and relying on a large number of General Circulation Models projections conditioned on various greenhouse gases emissions scenarios. The present work concludes with a case study illustrating how the developed downscaling methodology may be used to evaluate different strategies of sustainable stormwater management. The Industrial Zone of Plan-les-Ouates (ZIPLO) taken as example is a small urbanized area of Geneva. Urban drainage was characterized using a semi-distributed rainfall-runoff model, and climate change (under the higher greenhouse gas emissions scenario) was shown to increase significantly the peak discharge flows at the ZIPLO outlet. Different sustainable stormwater options were then evaluated in order to limit the peak discharge flows under the joint scenario of climate change and projected urbanization increase.

Modeling the immune system (IS) means putting together a set of assumptions about its components (cells and organs) and their interactions. Simulations of a model show joint behavior of the components, which for complex realistic models is often impossible to find analytically. Simulations allow us to experiment on how initial concentrations and properties of the immune cells and viruses impact the IS behavior, and gain better quantitative and qualitative insight into how the IS works and why different behavior patterns occur. A simulation, once it has been created, must be reviewed both statistically and analytically as well as validated from the biological point of view. We analyzed Chao’s immune system simulation [1][2] from a statistical and analytical point. We explicited both the Markov chain which was simulated and the underlying process on which Chao’s stage-structured approach was built. Furthermore, we established a test protocol for timestep validation which Chao’s simulator passed. We evaluated Chao’s simulator’s dependence on the random number generator, which was shown to be negligible. Finally, we evaluated the simulator output and our major result is the discovery of a secondary response to a primary infection, an occurrence is not shown in Chao’s dissertation. A tertiary response to the infection is never possible due to the size of the secondary response caused by memory cells.

2005