Overview
Mutual information is a fundamental concept from information theory that quantifies the amount of information one random variable provides about another. Informally, it measures how much the uncertainty about one quantity is reduced when the value of the other is known. A simple illustration: knowing the month of the year changes the probabilities of possible daily temperatures but does not determine the exact temperature. These changes in likelihood are what mutual information captures; see a brief definition for more context.
Formal meaning and formulae
For discrete variables X and Y, mutual information is often written as I(X;Y) = sum_{x,y} p(x,y) log[p(x,y)/(p(x)p(y))], which equals H(X) - H(X|Y) and also equals H(Y) - H(Y|X). Here H denotes entropy and H(·|·) conditional entropy. The measure is symmetric (I(X;Y)=I(Y;X)) and always nonnegative. It can be expressed as a Kullback–Leibler divergence between the joint distribution and the product of the marginals.
Key properties
- Nonnegativity: I(X;Y) >= 0, with equality if and only if X and Y are statistically independent.
- Symmetry: I(X;Y) = I(Y;X).
- Relation to entropy: I(X;Y) measures the reduction in entropy of one variable after observing the other.
- Units: commonly measured in bits (log base 2) or nats (natural log).
History and development
Mutual information was introduced in the mid-20th century as part of Claude Shannon's development of information theory. Since then it has been generalized to continuous variables, multivariate interactions (multivariate mutual information), and conditional forms such as I(X;Y|Z), which quantify information shared by X and Y beyond Z.
Uses and examples
Mutual information is widely used to detect and quantify statistical dependence, including nonlinear relationships that correlation can miss. Practical applications include feature selection in machine learning, measuring neural coding in neuroscience, assessing channel capacity in communications, and analyzing dependencies in genomics. For a concrete illustration, consider temperature and month: observing the month gives probabilistic guidance about temperature values; this probabilistic gain is what I(X;Y) measures — see an applied example.
Related concepts and distinctions
Unlike Pearson correlation, mutual information detects any dependence type, not only linear association. It differs from entropy (which quantifies uncertainty of a single variable) and from conditional mutual information (which accounts for a conditioning variable). Estimating mutual information from samples requires care: histogram methods, kernel estimators, and k-nearest-neighbor estimators are common, each trading bias and variance.
Mutual information remains a versatile and interpretable measure of shared information, bridging theoretical insight and practical tools across disciplines.


