**Are you an EPFL student looking for a semester project?**

Work with us on data science and visualisation projects, and deploy your project as an app on top of GraphSearch.

Category# Statistics

Summary

Statistics (from German: Statistik, () "description of a state, a country") is the discipline that concerns the collection, organization, analysis, interpretation, and presentation of data. In applying statistics to a scientific, industrial, or social problem, it is conventional to begin with a statistical population or a statistical model to be studied. Populations can be diverse groups of people or objects such as "all people living in a country" or "every atom composing a crystal". Statistics deals with every aspect of data, including the planning of data collection in terms of the design of surveys and experiments.
When census data cannot be collected, statisticians collect data by developing specific experiment designs and survey samples. Representative sampling assures that inferences and conclusions can reasonably extend from the sample to the population as a whole. An experimental study involves taking measurements of the system under study, manipulating the system, and then taking additional measurements using the same procedure to determine if the manipulation has modified the values of the measurements. In contrast, an observational study does not involve experimental manipulation.
Two main statistical methods are used in data analysis: descriptive statistics, which summarize data from a sample using indexes such as the mean or standard deviation, and inferential statistics, which draw conclusions from data that are subject to random variation (e.g., observational errors, sampling variation). Descriptive statistics are most often concerned with two sets of properties of a distribution (sample or population): central tendency (or location) seeks to characterize the distribution's central or typical value, while dispersion (or variability) characterizes the extent to which members of the distribution depart from its center and each other. Inferences on mathematical statistics are made under the framework of probability theory, which deals with the analysis of random phenomena.

Official source

This page is automatically generated and may contain information that is not correct, complete, up-to-date, or relevant to your search query. The same applies to every other page on this website. Please make sure to verify the information with EPFL's official sources.

Related startups (5)

Related categories (312)

Related people (239)

Related concepts (574)

Related lectures (1,000)

Related courses (225)

Related MOOCs (29)

Related publications (261)

Related units (9)

Thinkee

Active in automation, data management and IoT. Thinkee provides a comprehensive software platform for automating process monitoring, data analysis, and intervention tracking.

SensiMed

Active in glaucoma management, smart contact lens and continuous ocular monitoring. SensiMed has developed the SENSIMED Triggerfish, a smart contact lens revolutionizing glaucoma management by providing a complete 24-hour picture of the eye, offering valuable insights for personalized treatment programs.

Artiria Medical

Active in neurovascular, medtech and minimally invasive. Artiria Medical revolutionizes minimally invasive stroke treatment with cutting-edge deflectable guidewire technology.

Mathematical statistics

Mathematical statistics is the application of probability theory, a branch of mathematics, to statistics, as opposed to techniques for collecting statistical data. Specific mathematical techniques which are used for this include mathematical analysis, linear algebra, stochastic analysis, differential equations, and measure theory. Statistical data collection is concerned with the planning of studies, especially with the design of randomized experiments and with the planning of surveys using random sampling.

Regression analysis

In statistical modeling, regression analysis is a set of statistical processes for estimating the relationships between a dependent variable (often called the 'outcome' or 'response' variable, or a 'label' in machine learning parlance) and one or more independent variables (often called 'predictors', 'covariates', 'explanatory variables' or 'features'). The most common form of regression analysis is linear regression, in which one finds the line (or a more complex linear combination) that most closely fits the data according to a specific mathematical criterion.

Statistical hypothesis testing

A statistical hypothesis test is a method of statistical inference used to decide whether the data at hand sufficiently support a particular hypothesis. Hypothesis testing allows us to make probabilistic statements about population parameters. While hypothesis testing was popularized early in the 20th century, early forms were used in the 1700s. The first use is credited to John Arbuthnot (1710), followed by Pierre-Simon Laplace (1770s), in analyzing the human sex ratio at birth; see .

The sample mean (sample average) or empirical mean (empirical average), and the sample covariance or empirical covariance are statistics computed from a sample of data on one or more random variables. The sample mean is the average value (or mean value) of a sample of numbers taken from a larger population of numbers, where "population" indicates not number of people but the entirety of relevant data, whether collected or not. A sample of 40 companies' sales from the Fortune 500 might be used for convenience instead of looking at the population, all 500 companies' sales.

Evidence-based medicine (EBM) is "the conscientious, explicit and judicious use of current best evidence in making decisions about the care of individual patients". The aim of EBM is to integrate the experience of the clinician, the values of the patient, and the best available scientific information to guide decision-making about clinical management. The term was originally used to describe an approach to teaching the practice of medicine and improving decisions by individual physicians about individual patients.

Anscombe's quartet comprises four data sets that have nearly identical simple descriptive statistics, yet have very different distributions and appear very different when graphed. Each dataset consists of eleven (x,y) points. They were constructed in 1973 by the statistician Francis Anscombe to demonstrate both the importance of graphing data when analyzing it, and the effect of outliers and other influential observations on statistical properties.

Probability Distributions in Environmental Studies

Explores probability distributions for random variables in air pollution and climate change studies, covering descriptive and inferential statistics.

Red bus/Blue bus paradox

Explores the Red bus/Blue bus paradox, nested logit models, and multivariate extreme value models in transportation.

Model Selection Criteria: AIC, BIC, Cp

Explores model selection criteria like AIC, BIC, and Cp in statistics for data science.

DH-406: Machine learning for DH

This course aims to introduce the basic principles of machine learning in the context of the digital humanities. We will cover both supervised and unsupervised learning techniques, and study and imple

MATH-413: Statistics for data science

Statistics lies at the foundation of data science, providing a unifying theoretical and methodological backbone for the diverse tasks enountered in this emerging field. This course rigorously develops

PHYS-467: Machine learning for physicists

Machine learning and data analysis are becoming increasingly central in sciences including physics. In this course, fundamental principles and methods of machine learning will be introduced and practi

Selected Topics on Discrete Choice

Discrete choice models are used extensively in many disciplines where it is important to predict human behavior at a disaggregate level. This course is a follow up of the online course “Introduction t

Selected Topics on Discrete Choice

Discrete choice models are used extensively in many disciplines where it is important to predict human behavior at a disaggregate level. This course is a follow up of the online course “Introduction t

Neuronal Dynamics - Computational Neuroscience of Single Neurons

The activity of neurons in the brain and the code used by these neurons is described by mathematical neuron models at different levels of detail.

This thesis focuses on non-parametric covariance estimation for random surfaces, i.e.~functional data on a two-dimensional domain. Non-parametric covariance estimation lies at the heart of functional

Arthur Ulysse Jacot-Guillarmod

In the recent years, Deep Neural Networks (DNNs) have managed to succeed at tasks that previously appeared impossible, such as human-level object recognition, text synthesis, translation, playing game

Victor Panaretos, Tomas Masák, Tomas Rubin

Nonparametric inference for functional data over two-dimensional domains entails additional computational and statistical challenges, compared to the one-dimensional case. Separability of the covarian