Dirichlet-multinomial distribution

In probability theory and statistics, the Dirichlet-multinomial distribution is a family of discrete multivariate probability distributions on a finite support of non-negative integers. It is also called the Dirichlet compound multinomial distribution (DCM) or multivariate Pólya distribution (after George Pólya). It is a compound probability distribution, where a probability vector p is drawn from a Dirichlet distribution with parameter vector , and an observation drawn from a multinomial distribution with probability vector p and number of trials n. The Dirichlet parameter vector captures the prior belief about the situation and can be seen as a pseudocount: observations of each outcome that occur before the actual data is collected. The compounding corresponds to a Pólya urn scheme. It is frequently encountered in Bayesian statistics, machine learning, empirical Bayes methods and classical statistics as an overdispersed multinomial distribution. It reduces to the categorical distribution as a special case when n = 1. It also approximates the multinomial distribution arbitrarily well for large α. The Dirichlet-multinomial is a multivariate extension of the beta-binomial distribution, as the multinomial and Dirichlet distributions are multivariate versions of the binomial distribution and beta distributions, respectively. The Dirichlet distribution is a conjugate distribution to the multinomial distribution. This fact leads to an analytically tractable compound distribution. For a random vector of category counts , distributed according to a multinomial distribution, the marginal distribution is obtained by integrating on the distribution for p which can be thought of as a random vector following a Dirichlet distribution: which results in the following explicit formula: where is defined as the sum . Another form for this same compound distribution, written more compactly in terms of the beta function, B, is as follows: The latter form emphasizes the fact that zero count categories can be ignored in the calculation - a useful fact when the number of categories is very large and sparse (e.