**Are you an EPFL student looking for a semester project?**

Work with us on data science and visualisation projects, and deploy your project as an app on top of GraphSearch.

Concept# Independent and identically distributed random variables

Summary

In probability theory and statistics, a collection of random variables is independent and identically distributed if each random variable has the same probability distribution as the others and all are mutually independent. This property is usually abbreviated as i.i.d., iid, or IID. IID was first defined in statistics and finds application in different fields such as data mining and signal processing.
Introduction
Statistics commonly deals with random samples. A random sample can be thought of as a set of objects that are chosen randomly. More formally, it is "a sequence of independent, identically distributed (IID) random data points".
In other words, the terms random sample and IID are basically one and the same. In statistics, "random sample" is the typical terminology, but in probability it is more common to say "IID".

- Identically distributed means that there are no overall trends–the distribution doesn't fluctuate and all items in the sample are taken from the

Official source

This page is automatically generated and may contain information that is not correct, complete, up-to-date, or relevant to your search query. The same applies to every other page on this website. Please make sure to verify the information with EPFL's official sources.

Related publications

Loading

Related people

Loading

Related units

Loading

Related concepts

Loading

Related courses

Loading

Related lectures

Loading

Related people

No results

Related publications (7)

Loading

Loading

Loading

Related concepts (35)

Statistics

Statistics (from German: Statistik, "description of a state, a country") is the discipline that concerns the collection, organization, analysis, interpretation, and present

Probability distribution

In probability theory and statistics, a probability distribution is the mathematical function that gives the probabilities of occurrence of different possible outcomes for an experiment. It is a mat

Normal distribution

In statistics, a normal distribution or Gaussian distribution is a type of continuous probability distribution for a real-valued random variable. The general form of its probability density function

Related units

No results

Related lectures (223)

Related courses (63)

MATH-234(d): Probability and statistics

Ce cours enseigne les notions élémentaires de la théorie de probabilité et de la statistique, tels que l'inférence, les tests et la régression.

MATH-233: Probability and statistics

The course gives an introduction to probability and statistics for physicists.

MATH-234(b): Probability and statistics

Le cours présente les notions de base de la théorie des probabilités et de l'inférence statistique. L'accent est mis sur les concepts principaux ainsi que les méthodes les plus utilisées.

This work is a study of three interacting particle systems that are modified versions of the contact process. The contact process is a spin system defined on a graph and is commonly taken as a model for the spread of an infection in a population; transmission of the infection happens by proximity (contact). The first two models we consider – the grass-bushes-trees model and the multitype contact process – are models for competition between species in Ecology. The third model – annealed approximation to boolean networks – approximately describes the transmission of information among genes in a cell. We consider the grass-bushes-trees model on the set of integers, ℤ. Each point of ℤ is a region of space. In the continuous-time dynamics, at each instant each region can be either empty (state 0) or occupied by an individual of one of two existing species (states 1 and 2). Occupants of both species die at rate 1, leaving their regions empty, and send descendents to neighboring regions at rate λ. An individuals of type 1 may be born on a region previously occupied by an individual of type 2, but the converse is forbidden. We take the "heaviside" initial configuration in which all sites to the left of the origin are occupied by type 1 individuals and all sites to the right of the origin are occupied by type 2 individuals. If the birth of new individuals is allowed to occur at sites that are not adjacent to the parent, and if the rate λ is supercritical for the usual contact process on ℤ, we see the formation of an interface region in which both types coexist. Addressing a conjecture of Cox and Durrett (1995), we prove that the size of this region is stochastically tight. The multitype contact process on ℤ is a process identical to the grass-bushes-trees model in every respect except that no births can occur at previously occupied sites; in particular, the model is symmetric for both species. We again start the process from the heaviside configuration and prove that the size of the interface region is tight. In addition, we prove that the position of the interface, when properly rescaled, converges to Brownian motion. Finally, we give necessary and sufficient conditions on the initial configuration so that one of the two species becomes extinct with probability one and also so that both species are present at all times with positive probability. Lastly, we consider a model proposed by Derrida and Pomeau (1986) and recently studied by Chatterjee and Durrett (2009); it is defined as an approximation to S. Kauffman's boolean networks (1969). The model starts with the choice of a random directed graph on n vertices; each node has r input nodes pointing at it. A discrete time threshold contact process is then considered on this graph: at each instant, each site has probability q of choosing to receive input; if it does, and if at least one of its inputs were occupied by a 1 at the previous instant, then it is labeled with a 1; in all other cases, it is labeled with a 0. r and q are kept fixed and n is taken to infinity. Improving a result of Chatterjee and Durrett, we show that if qr > 1, then the time of persistence of the dynamics is exponential in n.

Daniel Kuhn, Soroosh Shafieezadeh Abadeh, Bahar Taskesen

We study the computational complexity of the optimal transport problem that evaluates the Wasser- stein distance between the distributions of two K-dimensional discrete random vectors. The best known algorithms for this problem run in polynomial time in the maximum of the number of atoms of the two distributions. However, if the components of either random vector are independent, then this number can be exponential in K even though the size of the problem description scales linearly with K. We prove that the described optimal transport problem is #P-hard even if all components of the first random vector are independent uniform Bernoulli random variables, while the second random vector has merely two atoms, and even if only approximate solutions are sought. We also develop a dynamic programming-type algorithm that approximates the Wasserstein distance in pseudo-polynomial time when the components of the first random vector follow arbitrary independent discrete distributions, and we identify special problem instances that can be solved exactly in strongly polynomial time.

2022Dynamic optimization problems affected by uncertainty are ubiquitous in many application domains. Decision makers typically model the uncertainty through random variables governed by a probability distribution. If the distribution is precisely known, then the emerging optimization problems constitute stochastic programs or chance constrained programs. On the other hand, if the distribution is at least partially unknown, then the emanating optimization problems represent robust or distributionally robust optimization problems. In this thesis, we leverage techniques from stochastic and distributionally robust optimization to address complex problems in finance, energy systems management and, more abstractly, applied probability. In particular, we seek to solve uncertain optimization problems where the prior distributional information includes only the first and the second moments (and, sometimes, the support). The main objective of the thesis is to solve large instances of practical optimization problems. For this purpose, we develop complexity reduction and decomposition schemes, which exploit structural symmetries or multiscale properties of the problems at hand in order to break them down into smaller and more tractable components. In the first part of the thesis we study the growth-optimal portfolio, which maximizes the expected log-utility over a single investment period. In a classical stochastic setting, this portfolio is known to outperform any other portfolio with probability 1 in the long run. In the short run, however, it is notoriously volatile. Moreover, its performance suffers in the presence of distributional ambiguity. We design fixed-mix strategies that offer similar performance guarantees as the classical growth-optimal portfolio but for a finite investment horizon. Moreover, the proposed performance guarantee remains valid for any asset return distribution with the same mean and covariance matrix. These results rely on a Taylor approximation of the terminal logarithmic wealth that becomes more accurate as the rebalancing frequency is increased. In the second part of the thesis, we demonstrate that such a Taylor approximation is in fact not necessary. Specifically, we derive sharp probability bounds on the tails of a product of non-negative random variables. These generalized Chebyshev bounds can be computed numerically using semidefinite programming--in some cases even analytically. Similar techniques can also be used to derive multivariate Chebyshev bounds for sums, maxima, and minima of random variables. In the final part of the thesis, we consider a multi-market reservoir management problem. The eroding peak/off-peak spreads on European electricity spot markets imply reduced profitability for the hydropower producers and force them to participate in the balancing markets. This motivates us to propose a two-layer stochastic programming model for the optimal operation of a cascade of hydropower plants selling energy on both spot and balancing markets. The planning problem optimizes the reservoir management over a yearly horizon with weekly granularity, and the trading subproblems optimize the market transactions over a weekly horizon with hourly granularity. We solve both the planning and trading problems in linear decision rules, and we exploit the inherent parallelizability of the trading subproblems to achieve computational tractability.