In Bayesian statistics, the posterior predictive distribution is the distribution of possible unobserved values conditional on the observed values.
Given a set of N i.i.d. observations , a new value will be drawn from a distribution that depends on a parameter , where is the parameter space.
It may seem tempting to plug in a single best estimate for , but this ignores uncertainty about , and because a source of uncertainty is ignored, the predictive distribution will be too narrow. Put another way, predictions of extreme values of will have a lower probability than if the uncertainty in the parameters as given by their posterior distribution is accounted for.
A posterior predictive distribution accounts for uncertainty about . The posterior distribution of possible values depends on :
And the posterior predictive distribution of given is calculated by marginalizing the distribution of given over the posterior distribution of given :
Because it accounts for uncertainty about , the posterior predictive distribution will in general be wider than a predictive distribution which plugs in a single best estimate for .
The prior predictive distribution, in a Bayesian context, is the distribution of a data point marginalized over its prior distribution . That is, if and , then the prior predictive distribution is the corresponding distribution , where
This is similar to the posterior predictive distribution except that the marginalization (or equivalently, expectation) is taken with respect to the prior distribution instead of the posterior distribution.
Furthermore, if the prior distribution is a conjugate prior, then the posterior predictive distribution will belong to the same family of distributions as the prior predictive distribution. This is easy to see. If the prior distribution is conjugate, then
i.e. the posterior distribution also belongs to but simply with a different parameter instead of the original parameter Then,
Hence, the posterior predictive distribution follows the same distribution H as the prior predictive distribution, but with the posterior values of the hyperparameters substituted for the prior ones.
This page is automatically generated and may contain information that is not correct, complete, up-to-date, or relevant to your search query. The same applies to every other page on this website. Please make sure to verify the information with EPFL's official sources.
We discuss a set of topics that are important for the understanding of modern data science but that are typically not taught in an introductory ML course. In particular we discuss fundamental ideas an
Statistics lies at the foundation of data science, providing a unifying theoretical and methodological backbone for the diverse tasks enountered in this emerging field. This course rigorously develops
In probability and statistics, a compound probability distribution (also known as a mixture distribution or contagious distribution) is the probability distribution that results from assuming that a random variable is distributed according to some parametrized distribution, with (some of) the parameters of that distribution themselves being random variables. If the parameter is a scale parameter, the resulting mixture is also called a scale mixture.
In probability theory and statistics, the Dirichlet-multinomial distribution is a family of discrete multivariate probability distributions on a finite support of non-negative integers. It is also called the Dirichlet compound multinomial distribution (DCM) or multivariate Pólya distribution (after George Pólya). It is a compound probability distribution, where a probability vector p is drawn from a Dirichlet distribution with parameter vector , and an observation drawn from a multinomial distribution with probability vector p and number of trials n.
In probability theory and statistics, a categorical distribution (also called a generalized Bernoulli distribution, multinoulli distribution) is a discrete probability distribution that describes the possible results of a random variable that can take on one of K possible categories, with the probability of each category separately specified. There is no innate underlying ordering of these outcomes, but numerical labels are often attached for convenience in describing the distribution, (e.g. 1 to K).
Gels made of telechelic polymers connected by reversible cross-linkers are a versatile design platform for biocompatible viscoelastic materials. Their linear response to a step strain displays a fast, near-exponential relaxation when using low-valence cros ...
A current propagation type return stroke model which is consistent with the estimated distribution of the charge on the leader channel is described. The model takes into account the dispersion of the return stroke current along the return stroke channel. T ...
2023
, ,
Deep heteroscedastic regression involves jointly optimizing the mean and covariance of the predicted distribution using the negative log-likelihood. However, recent works show that this may result in sub-optimal convergence due to the challenges associated ...