**Are you an EPFL student looking for a semester project?**

Work with us on data science and visualisation projects, and deploy your project as an app on top of GraphSearch.

Concept# Posterior predictive distribution

Summary

In Bayesian statistics, the posterior predictive distribution is the distribution of possible unobserved values conditional on the observed values.
Given a set of N i.i.d. observations , a new value will be drawn from a distribution that depends on a parameter , where is the parameter space.
It may seem tempting to plug in a single best estimate for , but this ignores uncertainty about , and because a source of uncertainty is ignored, the predictive distribution will be too narrow. Put another way, predictions of extreme values of will have a lower probability than if the uncertainty in the parameters as given by their posterior distribution is accounted for.
A posterior predictive distribution accounts for uncertainty about . The posterior distribution of possible values depends on :
And the posterior predictive distribution of given is calculated by marginalizing the distribution of given over the posterior distribution of given :
Because it accounts for uncertainty about , the posterior predictive distribution will in general be wider than a predictive distribution which plugs in a single best estimate for .
The prior predictive distribution, in a Bayesian context, is the distribution of a data point marginalized over its prior distribution . That is, if and , then the prior predictive distribution is the corresponding distribution , where
This is similar to the posterior predictive distribution except that the marginalization (or equivalently, expectation) is taken with respect to the prior distribution instead of the posterior distribution.
Furthermore, if the prior distribution is a conjugate prior, then the posterior predictive distribution will belong to the same family of distributions as the prior predictive distribution. This is easy to see. If the prior distribution is conjugate, then
i.e. the posterior distribution also belongs to but simply with a different parameter instead of the original parameter Then,
Hence, the posterior predictive distribution follows the same distribution H as the prior predictive distribution, but with the posterior values of the hyperparameters substituted for the prior ones.

This page is automatically generated and may contain information that is not correct, complete, up-to-date, or relevant to your search query. The same applies to every other page on this website. Please make sure to verify the information with EPFL's official sources.

Related publications (3)

Related concepts (13)

Related courses (15)

Related lectures (140)

Compound probability distribution

In probability and statistics, a compound probability distribution (also known as a mixture distribution or contagious distribution) is the probability distribution that results from assuming that a random variable is distributed according to some parametrized distribution, with (some of) the parameters of that distribution themselves being random variables. If the parameter is a scale parameter, the resulting mixture is also called a scale mixture.

Posterior predictive distribution

In Bayesian statistics, the posterior predictive distribution is the distribution of possible unobserved values conditional on the observed values. Given a set of N i.i.d. observations , a new value will be drawn from a distribution that depends on a parameter , where is the parameter space. It may seem tempting to plug in a single best estimate for , but this ignores uncertainty about , and because a source of uncertainty is ignored, the predictive distribution will be too narrow.

Dirichlet-multinomial distribution

In probability theory and statistics, the Dirichlet-multinomial distribution is a family of discrete multivariate probability distributions on a finite support of non-negative integers. It is also called the Dirichlet compound multinomial distribution (DCM) or multivariate Pólya distribution (after George Pólya). It is a compound probability distribution, where a probability vector p is drawn from a Dirichlet distribution with parameter vector , and an observation drawn from a multinomial distribution with probability vector p and number of trials n.

We discuss a set of topics that are important for the understanding of modern data science but that are typically not taught in an introductory ML course. In particular we discuss fundamental ideas an

Statistics lies at the foundation of data science, providing a unifying theoretical and methodological backbone for the diverse tasks enountered in this emerging field. This course rigorously develops

Covers exponential families, maximum entropy, and Moxwell-Boltzmann distribution properties.

Explores words, tokens, n-grams, and language models, focusing on probabilistic approaches for language identification and spelling error correction.

Explains one hot encoding and the multinomial distribution with a focus on indicator vectors and probability functions.

Two-component mixture distributions with one component a point mass and the other a continuous density may be used as priors for Bayesian inference when sparse representation of an underlying signal i

2009Magnetoencephalography (MEG) is an imaging technique used to measure the magnetic field outside the human head produced by the electrical activity inside the brain. The MEG inverse problem, identifyin

Guillaume Philippe Ivan Joseph Dehaene

Expectation propagation (EP) is a widely successful algorithm for variational inference. EP is an iterative algorithm used to approximate complicated distributions, typically to find a Gaussian approx