In information geometry, a divergence is a kind of statistical distance: a binary function which establishes the separation from one probability distribution to another on a statistical manifold.
The simplest divergence is squared Euclidean distance (SED), and divergences can be viewed as generalizations of SED. The other most important divergence is relative entropy (also called Kullback–Leibler divergence), which is central to information theory. There are numerous other specific divergences and classes of divergences, notably f-divergences and Bregman divergences (see ).
Given a differentiable manifold of dimension , a divergence on is a -function satisfying:
for all (non-negativity),
if and only if (positivity),
At every point , is a positive-definite quadratic form for infinitesimal displacements from .
In applications to statistics, the manifold is typically the space of parameters of a parametric family of probability distributions.
Condition 3 means that defines an inner product on the tangent space for every . Since is on , this defines a Riemannian metric on .
Locally at , we may construct a local coordinate chart with coordinates , then the divergence is where is a matrix of size . It is the Riemannian metric at point expressed in coordinates .
Dimensional analysis of condition 3 shows that divergence has the dimension of squared distance.
The dual divergence is defined as
When we wish to contrast against , we refer to as primal divergence.
Given any divergence , its symmetrized version is obtained by averaging it with its dual divergence:
Unlike metrics, divergences are not required to be symmetric, and the asymmetry is important in applications. Accordingly, one often refers asymmetrically to the divergence "of q from p" or "from p to q", rather than "between p and q". Secondly, divergences generalize squared distance, not linear distance, and thus do not satisfy the triangle inequality, but some divergences (such as the Bregman divergence) do satisfy generalizations of the Pythagorean theorem.
This page is automatically generated and may contain information that is not correct, complete, up-to-date, or relevant to your search query. The same applies to every other page on this website. Please make sure to verify the information with EPFL's official sources.
We discuss a set of topics that are important for the understanding of modern data science but that are typically not taught in an introductory ML course. In particular we discuss fundamental ideas an
This course presents numerical methods for the solution of mathematical problems such as systems of linear and non-linear equations, functions approximation, integration and differentiation, and diffe
In mathematical statistics, the Kullback–Leibler divergence (also called relative entropy and I-divergence), denoted , is a type of statistical distance: a measure of how one probability distribution P is different from a second, reference probability distribution Q. A simple interpretation of the KL divergence of P from Q is the expected excess surprise from using Q as a model when the actual distribution is P.
In probability and statistics, the Hellinger distance (closely related to, although different from, the Bhattacharyya distance) is used to quantify the similarity between two probability distributions. It is a type of f-divergence. The Hellinger distance is defined in terms of the Hellinger integral, which was introduced by Ernst Hellinger in 1909. It is sometimes called the Jeffreys distance. To define the Hellinger distance in terms of measure theory, let and denote two probability measures on a measure space that are absolutely continuous with respect to an auxiliary measure .
In probability theory, an -divergence is a function that measures the difference between two probability distributions and . Many common divergences, such as KL-divergence, Hellinger distance, and total variation distance, are special cases of -divergence. These divergences were introduced by Alfréd Rényi in the same paper where he introduced the well-known Rényi entropy. He proved that these divergences decrease in Markov processes.
Information theory has allowed us to determine the fundamental limit of various communication and algorithmic problems, e.g., the channel coding problem, the compression problem, and the hypothesis testing problem. In this work, we revisit the assumptions ...
Ultrafast ultrasound imaging, characterized by high frame rates, generates low-quality images. Convolutional neural networks (CNNs) have demonstrated great potential to enhance image quality without compromising the frame rate. However, CNNs have been most ...
We propose a structured prediction approach for robot imitation learning from demonstrations. Among various tools for robot imitation learning, supervised learning has been observed to have a prominent role. Structured prediction is a form of supervised le ...