Inequalities in information theory

Content sourced from Wikipedia, licensed under CC BY-SA 3.0.

Inequalities in information theory (short and simple)

- Why they matter: Inequalities describe how the uncertainties (entropies) of random variables relate to each other. They help us understand what combinations of entropies are possible.

- The setup: Take X1, X2, ..., Xn, random variables with finite or countable support. For every subset I of {1, ..., n}, we can compute a joint entropy H(Xi for i in I). There are 2^n such subsets.

- A core idea: Many useful inequalities come from the fact that conditional mutual information is always nonnegative: I(A; B | C) ≥ 0 for any choices of A, B, C built from the Xi. Taking linear combinations of these basic facts gives a wide class of so‑called Shannon-type inequalities.

- Entropic vectors: Define h_I = H(Xi for i in I) for every subset I. A vector h is called entropic if it comes from some joint distribution of X1, ..., Xn. The set of all entropic vectors, denoted Γ*_n, is not closed and not convex for n ≥ 3. Its closure, however, is convex and can be described by a (potentially infinite) set of linear entropic inequalities.

- Shannon vs non-Shannon: The inequalities that come from nonnegativity of conditional mutual information (Shannon-type) describe a superset of the entropic vectors, but for n ≥ 4 this is not tight. There are additional non-Shannon type inequalities (the first found by Zhang and Yeung). In fact, no finite list of inequalities can capture all entropic inequalities, so the entropic region is not a polytope.

- A unifying view through KL divergence: Many key inequalities are lower bounds for the Kullback–Leibler divergence D_KL(P || Q). For example, some Shannon-type inequalities can be seen as consequences of the fact that certain KL divergences are nonnegative (Gibbs’ inequality). Upper bounds for D_KL are harder because the divergence can become very large when events are rare under Q.

- Basic bounds you should know:
- Nonnegativity: D_KL(P || Q) ≥ 0.
- Kullback’s inequality relates KL divergence to moments and large deviations when P and Q are real-valued distributions with finite means.
- Pinsker’s inequality connects D_KL(P || Q) to the total variation distance: D_KL(P || Q) ≥ 2 [TV(P, Q)]^2.

- An uncertainty principle twist: Hirschman showed that for suitable functions f, the sum of the differential entropies of |f|^2 and its Fourier transform |g|^2 is nonnegative, with a tighter bound log(e/2) attained by Gaussians. This strengthens a form of the Heisenberg uncertainty principle.

- A discrete example link: For discrete X and Y where Y′ is a function of Y with no extra randomness, certain relationships between conditional expectations and conditional mutual information follow from Pinsker’s inequality (up to a small adjustment if you measure in bits rather than natural units).

- Tools and extensions: There are machine-based proof checkers that test whether inequalities hold and can sometimes produce proofs or counterexamples. In quantum information, similar contraction-map methods are used to check inequalities.

- Takeaway: Inequalities in information theory organize how entropies can behave. Some come from simple nonnegativity ideas, while others require deeper results like non-Shannon inequalities or KL divergence theory. The landscape is rich and, in general, not capturable by a finite list of linear inequalities.

This page was last edited on 2 February 2026, at 04:05 (CET).