Concept

# Pointwise mutual information

Summary
In statistics, probability theory and information theory, pointwise mutual information (PMI), or point mutual information, is a measure of association. It compares the probability of two events occurring together to what this probability would be if the events were independent. PMI (especially in its positive pointwise mutual information variant) has been described as "one of the most important concepts in NLP", where it "draws on the intuition that the best way to weigh the association between two words is to ask how much more the two words co-occur in [a] corpus than we would have a priori expected them to appear by chance." The concept was introduced in 1961 by Robert Fano under the name of "mutual information", but today that term is instead used for a related measure of dependence between random variables: The mutual information (MI) of two discrete random variables refers to the average PMI of all possible events. The PMI of a pair of outcomes x and y belonging to discrete random variables X and Y quantifies the discrepancy between the probability of their coincidence given their joint distribution and their individual distributions, assuming independence. Mathematically: (with the latter two expressions being equal to the first by Bayes' theorem). The mutual information (MI) of the random variables X and Y is the expected value of the PMI (over all possible outcomes). The measure is symmetric (). It can take positive or negative values, but is zero if X and Y are independent. Note that even though PMI may be negative or positive, its expected outcome over all joint events (MI) is non-negative. PMI maximizes when X and Y are perfectly associated (i.e. or ), yielding the following bounds: Finally, will increase if is fixed but decreases. Here is an example to illustrate: Using this table we can marginalize to get the following additional table for the individual distributions: With this example, we can compute four values for . Using base-2 logarithms: (For reference, the mutual information would then be 0.