Dirichlet-multinomial distribution

Content sourced from Wikipedia, licensed under CC BY-SA 3.0.

Dirichlet-Multinomial distribution: a short, easy guide

What it is
- The Dirichlet-Multinomial (also called Dirichlet compound multinomial) is a multivariate discrete distribution for counts x = (x1, ..., xK) with total n trials.
- It arises when the category probabilities p = (p1, ..., pK) are themselves random and drawn from a Dirichlet distribution with parameters α = (α1, ..., αK). Given p, the counts follow a Multinomial(n, p). Mixing over p gives the Dirichlet-Momultinomial.

Notation
- n: total number of trials, n ≥ 0
- α = (α1, ..., αK), with αk > 0
- α0 = ∑k αk
- x = (x1, ..., xK), with xi ≥ 0 and ∑i xi = n

Probability mass function (PMF)
- The probability of observing counts x is:
P(x | n, α) = [Γ(α0) Γ(n+1) / Γ(n+α0)] × ∏k=1..K [ Γ(xk + αk) / ( Γ(αk) Γ(xk + 1) ) ]
- This is the explicit form after integrating out p from the Dirichlet prior and the multinomial likelihood.

What the numbers mean
- Mean: E[Xi] = n αi / α0
- Each category i gets, on average, a share proportional to its αi.
- Variance: Var(Xi) = n (αi/α0) [1 − (αi/α0)] × [(n + α0) / (1 + α0)]
- The distribution is more spread out than a plain multinomial (overdispersion).
- Covariance: Cov(Xi, Xj) = − n (αi/α0)(αj/α0) × [(n + α0) / (1 + α0)] for i ≠ j
- Counts in different categories compete with each other.

Intuition and interpretation
- The Dirichlet prior with α captures prior beliefs about the relative frequencies of the K categories before seeing data. The αk act as pseudocounts.
- Mixing a Dirichlet prior into a multinomial creates a compound distribution that accounts for extra variability (overdispersion) beyond the multinomial.
- As α0 becomes large (strong prior belief) or as n grows with a fixed α, the Dirichlet-Multinomial behaves more like a standard multinomial.

Special cases and relations
- If K = 2, the Dirichlet-Multinomial reduces to the Beta-Binomial distribution.
- If α0 → ∞ (strong prior) or data are very informative, the distribution approaches a simple multinomial with fixed probabilities.
- It is a multivariate extension of the Beta-Binomial and is conjugate to the multinomial in a hierarchical setup.

Connections and use
- The Dirichlet distribution is a conjugate prior for the multinomial, so integrating out p yields the Dirichlet-Multinomial.
- It is closely related to Polya’s urn model and provides a convenient way to model overdispersed multinomial data.
- Applications include Bayesian statistics, machine learning, empirical Bayes, and classical statistics, especially in areas like document classification and clustering, genetics, and economy.

Notes
- The Dirichlet-Multinomial is also called the Dirichlet compound multinomial (DCM) and is useful when some categories may be sparse or when there is burstiness in category counts.
- It reduces to the plain multinomial distribution in the limit of strong prior strength or large sample size, and to Beta-binomial in the two-category case.

This page was last edited on 2 February 2026, at 17:07 (CET).