In probability theory, the total variation distance is a distance measure for probability distributions. It is an example of a statistical distance metric, and is sometimes called the statistical distance, statistical difference or variational distance.
Consider a measurable space and probability measures and defined on .
The total variation distance between and is defined as:
Informally, this is the largest possible difference between the probabilities that the two probability distributions can assign to the same event.
The total variation distance is related to the Kullback–Leibler divergence by Pinsker’s inequality:
One also has the following inequality, due to Bretagnolle and Huber (see, also, Tsybakov), which has the advantage of providing a non-vacuous bound even when :
When is countable, the total variation distance is related to the L1 norm by the identity:
The total variation distance is related to the Hellinger distance as follows:
These inequalities follow immediately from the inequalities between the 1-norm and the 2-norm.
The total variation distance (or half the norm) arises as the optimal transportation cost, when the cost function is , that is,
where the expectation is taken with respect to the probability measure on the space where lives, and the infimum is taken over all such with marginals and , respectively.
This page is automatically generated and may contain information that is not correct, complete, up-to-date, or relevant to your search query. The same applies to every other page on this website. Please make sure to verify the information with EPFL's official sources.
In probability theory, an -divergence is a function that measures the difference between two probability distributions and . Many common divergences, such as KL-divergence, Hellinger distance, and total variation distance, are special cases of -divergence. These divergences were introduced by Alfréd Rényi in the same paper where he introduced the well-known Rényi entropy. He proved that these divergences decrease in Markov processes.
We consider a setup in which confidential i.i.d. samples X1, . . . , Xn from an unknown finite-support distribution p are passed through n copies of a discrete privatization chan- nel (a.k.a. mechanism) producing outputs Y1, . . . , Yn. The channel law gua ...
2021
, ,
We consider the problem of parameter estimation in a Bayesian setting and propose a general lower-bound that includes part of the family of f-Divergences. The results are then applied to specific settings of interest and compared to other notable results i ...
2022
,
We formulate gradient-based Markov chain Monte Carlo (MCMC) sampling as optimization on the space of probability measures, with Kullback-Leibler (KL) divergence as the objective functional. We show that an under-damped form of the Langevin algorithm perfor ...