In information theory, joint entropy is a measure of the uncertainty associated with a set of variables.
The joint Shannon entropy (in bits) of two discrete random variables and with images and is defined as
where and are particular values of and , respectively, is the joint probability of these values occurring together, and is defined to be 0 if .
For more than two random variables this expands to
where are particular values of , respectively, is the probability of these values occurring together, and is defined to be 0 if .
The joint entropy of a set of random variables is a nonnegative number.
The joint entropy of a set of variables is greater than or equal to the maximum of all of the individual entropies of the variables in the set.
The joint entropy of a set of variables is less than or equal to the sum of the individual entropies of the variables in the set. This is an example of subadditivity. This inequality is an equality if and only if and are statistically independent.
Joint entropy is used in the definition of conditional entropy
and It is also used in the definition of mutual information
In quantum information theory, the joint entropy is generalized into the joint quantum entropy.
The above definition is for discrete random variables and just as valid in the case of continuous random variables. The continuous version of discrete joint entropy is called joint differential (or continuous) entropy. Let and be a continuous random variables with a joint probability density function . The differential joint entropy is defined as
For more than two continuous random variables the definition is generalized to:
The integral is taken over the support of . It is possible that the integral does not exist in which case we say that the differential entropy is not defined.
As in the discrete case the joint differential entropy of a set of random variables is smaller or equal than the sum of the entropies of the individual random variables:
The following chain rule holds for two random variables:
In the case of m
This page is automatically generated and may contain information that is not correct, complete, up-to-date, or relevant to your search query. The same applies to every other page on this website. Please make sure to verify the information with EPFL's official sources.
We discuss a set of topics that are important for the understanding of modern data science but that are typically not taught in an introductory ML course. In particular we discuss fundamental ideas an
Text, sound, and images are examples of information sources stored in our computers and/or communicated over the Internet. How do we measure, compress, and protect the informatin they contain?
In information theory, the cross-entropy between two probability distributions and over the same underlying set of events measures the average number of bits needed to identify an event drawn from the set if a coding scheme used for the set is optimized for an estimated probability distribution , rather than the true distribution . The cross-entropy of the distribution relative to a distribution over a given set is defined as follows: where is the expected value operator with respect to the distribution .
The mathematical theory of information is based on probability theory and statistics, and measures information with several quantities of information. The choice of logarithmic base in the following formulae determines the unit of information entropy that is used. The most common unit of information is the bit, or more correctly the shannon, based on the binary logarithm.
In information theory, the conditional entropy quantifies the amount of information needed to describe the outcome of a random variable given that the value of another random variable is known. Here, information is measured in shannons, nats, or hartleys. The entropy of conditioned on is written as . The conditional entropy of given is defined as where and denote the support sets of and . Note: Here, the convention is that the expression should be treated as being equal to zero. This is because .
Train wheel flats are formed when wheels slip on rails. Crucial for passenger comfort and the safe operation of train systems, early detection and quantification of wheel-flat severity without interrupting railway operations is a desirable and challenging ...
The conditional mean is a fundamental and important quantity whose applications include the theories of estimation and rate-distortion. It is also notoriously difficult to work with. This paper establishes novel bounds on the differential entropy of the co ...
This paper considers an additive Gaussian noise channel with arbitrarily distributed finite variance input signals. It studies the differential entropy of the minimum mean-square error (MMSE) estimator and provides a new lower bound which connects the diff ...