**Are you an EPFL student looking for a semester project?**

Work with us on data science and visualisation projects, and deploy your project as an app on top of Graph Search.

Concept# Dirichlet-multinomial distribution

Summary

In probability theory and statistics, the Dirichlet-multinomial distribution is a family of discrete multivariate probability distributions on a finite support of non-negative integers. It is also called the Dirichlet compound multinomial distribution (DCM) or multivariate Pólya distribution (after George Pólya). It is a compound probability distribution, where a probability vector p is drawn from a Dirichlet distribution with parameter vector , and an observation drawn from a multinomial distribution with probability vector p and number of trials n. The Dirichlet parameter vector captures the prior belief about the situation and can be seen as a pseudocount: observations of each outcome that occur before the actual data is collected. The compounding corresponds to a Pólya urn scheme. It is frequently encountered in Bayesian statistics, machine learning, empirical Bayes methods and classical statistics as an overdispersed multinomial distribution.
It reduces to the categorical distribution as a special case when n = 1. It also approximates the multinomial distribution arbitrarily well for large α. The Dirichlet-multinomial is a multivariate extension of the beta-binomial distribution, as the multinomial and Dirichlet distributions are multivariate versions of the binomial distribution and beta distributions, respectively.
The Dirichlet distribution is a conjugate distribution to the multinomial distribution. This fact leads to an analytically tractable compound distribution.
For a random vector of category counts , distributed according to a multinomial distribution, the marginal distribution is obtained by integrating on the distribution for p which can be thought of as a random vector following a Dirichlet distribution:
which results in the following explicit formula:
where is defined as the sum . Another form for this same compound distribution, written more compactly in terms of the beta function, B, is as follows:
The latter form emphasizes the fact that zero count categories can be ignored in the calculation - a useful fact when the number of categories is very large and sparse (e.

This page is automatically generated and may contain information that is not correct, complete, up-to-date, or relevant to your search query. The same applies to every other page on this website. Please make sure to verify the information with EPFL's official sources.

Related publications (104)

Related concepts (16)

Related courses (13)

Related people (31)

Related units (3)

Related lectures (61)

Categorical distribution

In probability theory and statistics, a categorical distribution (also called a generalized Bernoulli distribution, multinoulli distribution) is a discrete probability distribution that describes the possible results of a random variable that can take on one of K possible categories, with the probability of each category separately specified. There is no innate underlying ordering of these outcomes, but numerical labels are often attached for convenience in describing the distribution, (e.g. 1 to K).

Compound probability distribution

In probability and statistics, a compound probability distribution (also known as a mixture distribution or contagious distribution) is the probability distribution that results from assuming that a random variable is distributed according to some parametrized distribution, with (some of) the parameters of that distribution themselves being random variables. If the parameter is a scale parameter, the resulting mixture is also called a scale mixture.

Posterior predictive distribution

In Bayesian statistics, the posterior predictive distribution is the distribution of possible unobserved values conditional on the observed values. Given a set of N i.i.d. observations , a new value will be drawn from a distribution that depends on a parameter , where is the parameter space. It may seem tempting to plug in a single best estimate for , but this ignores uncertainty about , and because a source of uncertainty is ignored, the predictive distribution will be too narrow.

DH-406: Machine learning for DH

This course aims to introduce the basic principles of machine learning in the context of the digital humanities. We will cover both supervised and unsupervised learning techniques, and study and imple

MATH-232: Probability and statistics

A basic course in probability and statistics

FIN-417: Quantitative risk management

This course is an introduction to quantitative risk management that covers standard statistical methods, multivariate risk factor models, non-linear dependence structures (copula models), as well as p

, , , , , , , , ,

As large, data-driven artificial intelligence models become ubiquitous, guaranteeing high data quality is imperative for constructing models. Crowdsourcing, community sensing, and data filtering have long been the standard approaches to guaranteeing or imp ...

Covers the review of probability concepts including Poisson distribution and moment generating functions.

Introduces types of variables, multinomial distribution, data characteristics, shapes of densities, correlation, and data visualization methods.

Explores asymmetric parametric models and the extremal coefficient in extreme value dependence.

Antoine Pierre François Jego, Titus Lupu

We construct a measure on the thick points of a Brownian loop soup in a bounded domain D$D$ of the plane with given intensity theta>0$\theta >0$, which is formally obtained by exponentiating the square root of its occupation field. The measure is construct ...

Catalina Paz Alvarez Inostroza

Keeping track of Internet latency is a classic measurement problem. Open measurement platforms like RIPE Atlas are a great solution, but they also face challenges: preventing network overload that may result from uncontrolled active measurements, and maint ...

2023