In information theory, the typical set is a set of sequences whose probability is close to two raised to the negative power of the entropy of their source distribution. That this set has total probability close to one is a consequence of the asymptotic equipartition property (AEP) which is a kind of law of large numbers. The notion of typicality is only concerned with the probability of a sequence and not the actual sequence itself.
This has great use in compression theory as it provides a theoretical means for compressing data, allowing us to represent any sequence Xn using nH(X) bits on average, and, hence, justifying the use of entropy as a measure of information from a source.
The AEP can also be proven for a large class of stationary ergodic processes, allowing typical set to be defined in more general cases.
If a sequence x1, ..., xn is drawn from an i.i.d. distribution X defined over a finite alphabet , then the typical set, Aε(n)(n) is defined as those sequences which satisfy:
where
is the information entropy of X. The probability above need only be within a factor of 2n ε. Taking the logarithm on all sides and dividing by -n, this definition can be equivalently stated as
For i.i.d sequence, since
we further have
By the law of large numbers, for sufficiently large n
An essential characteristic of the typical set is that, if one draws a large number n of independent random samples from the distribution X, the resulting sequence (x1, x2, ..., xn) is very likely to be a member of the typical set, even though the typical set comprises only a small fraction of all the possible sequences. Formally, given any , one can choose n such that:
The probability of a sequence from X(n) being drawn from Aε(n) is greater than 1 − ε, i.e.
If the distribution over is not uniform, then the fraction of sequences that are typical is
as n becomes very large, since where is the cardinality of .
For a general stochastic process {X(t)} with AEP, the (weakly) typical set can be defined similarly with p(x1, x2, ..., xn) replaced by p(x0τ) (i.