Summary
In probability theory and statistics, the Dirichlet-multinomial distribution is a family of discrete multivariate probability distributions on a finite support of non-negative integers. It is also called the Dirichlet compound multinomial distribution (DCM) or multivariate Pólya distribution (after George Pólya). It is a compound probability distribution, where a probability vector p is drawn from a Dirichlet distribution with parameter vector , and an observation drawn from a multinomial distribution with probability vector p and number of trials n. The Dirichlet parameter vector captures the prior belief about the situation and can be seen as a pseudocount: observations of each outcome that occur before the actual data is collected. The compounding corresponds to a Pólya urn scheme. It is frequently encountered in Bayesian statistics, machine learning, empirical Bayes methods and classical statistics as an overdispersed multinomial distribution. It reduces to the categorical distribution as a special case when n = 1. It also approximates the multinomial distribution arbitrarily well for large α. The Dirichlet-multinomial is a multivariate extension of the beta-binomial distribution, as the multinomial and Dirichlet distributions are multivariate versions of the binomial distribution and beta distributions, respectively. The Dirichlet distribution is a conjugate distribution to the multinomial distribution. This fact leads to an analytically tractable compound distribution. For a random vector of category counts , distributed according to a multinomial distribution, the marginal distribution is obtained by integrating on the distribution for p which can be thought of as a random vector following a Dirichlet distribution: which results in the following explicit formula: where is defined as the sum . Another form for this same compound distribution, written more compactly in terms of the beta function, B, is as follows: The latter form emphasizes the fact that zero count categories can be ignored in the calculation - a useful fact when the number of categories is very large and sparse (e.
About this result
This page is automatically generated and may contain information that is not correct, complete, up-to-date, or relevant to your search query. The same applies to every other page on this website. Please make sure to verify the information with EPFL's official sources.