Concept

# Posterior predictive distribution

Summary
In Bayesian statistics, the posterior predictive distribution is the distribution of possible unobserved values conditional on the observed values. Given a set of N i.i.d. observations , a new value will be drawn from a distribution that depends on a parameter , where is the parameter space. It may seem tempting to plug in a single best estimate for , but this ignores uncertainty about , and because a source of uncertainty is ignored, the predictive distribution will be too narrow. Put another way, predictions of extreme values of will have a lower probability than if the uncertainty in the parameters as given by their posterior distribution is accounted for. A posterior predictive distribution accounts for uncertainty about . The posterior distribution of possible values depends on : And the posterior predictive distribution of given is calculated by marginalizing the distribution of given over the posterior distribution of given : Because it accounts for uncertainty about , the posterior predictive distribution will in general be wider than a predictive distribution which plugs in a single best estimate for . The prior predictive distribution, in a Bayesian context, is the distribution of a data point marginalized over its prior distribution . That is, if and , then the prior predictive distribution is the corresponding distribution , where This is similar to the posterior predictive distribution except that the marginalization (or equivalently, expectation) is taken with respect to the prior distribution instead of the posterior distribution. Furthermore, if the prior distribution is a conjugate prior, then the posterior predictive distribution will belong to the same family of distributions as the prior predictive distribution. This is easy to see. If the prior distribution is conjugate, then i.e. the posterior distribution also belongs to but simply with a different parameter instead of the original parameter Then, Hence, the posterior predictive distribution follows the same distribution H as the prior predictive distribution, but with the posterior values of the hyperparameters substituted for the prior ones.