In probability theory, Hoeffding's inequality provides an upper bound on the probability that the sum of bounded independent random variables deviates from its expected value by more than a certain amount. Hoeffding's inequality was proven by Wassily Hoeffding in 1963.
Hoeffding's inequality is a special case of the Azuma–Hoeffding inequality and McDiarmid's inequality. It is similar to the Chernoff bound, but tends to be less sharp, in particular when the variance of the random variables is small. It is similar to, but incomparable with, one of Bernstein's inequalities.
Let X1, ..., Xn be independent random variables such that almost surely. Consider the sum of these random variables,
Then Hoeffding's theorem states that, for all t > 0,
Here E[Sn] is the expected value of Sn.
Note that the inequalities also hold when the Xi have been obtained using sampling without replacement; in this case the random variables are not independent anymore. A proof of this statement can be found in Hoeffding's paper. For slightly better bounds in the case of sampling without replacement, see for instance the paper by .
Suppose and for all i. This can occur when Xi are independent Bernoulli random variables, though they need not be identically distributed. Then we get the inequality
for all . This is a version of the additive Chernoff bound which is more general, since it allows for random variables that take values between zero and one, but also weaker, since the Chernoff bound gives a better tail bound when the random variables have small variance.
The proof of Hoeffding's inequality can be generalized to any sub-Gaussian distribution. Recall that a random variable X is called sub-Gaussian, if
for some c>0. For any bounded variable X, for for some sufficiently large T. Then for all so taking yields
for . So every bounded variable is sub-Gaussian.
For a random variable X, the following norm is finite if and only if X is sub-Gaussian:
Then let X1, ..., Xn be zero-mean independent sub-Gaussian random variables, the general version of the Hoeffding's inequality states that:
where c > 0 is an absolute constant.
This page is automatically generated and may contain information that is not correct, complete, up-to-date, or relevant to your search query. The same applies to every other page on this website. Please make sure to verify the information with EPFL's official sources.
In probability theory, Bennett's inequality provides an upper bound on the probability that the sum of independent random variables deviates from its expected value by more than any specified amount. Bennett's inequality was proved by George Bennett of the University of New South Wales in 1962. Let X1, ... Xn be independent random variables with finite variance. Further assume Xi ≤ a almost surely for all i, and define and Then for any t ≥ 0, where h(u) = (1 + u)log(1 + u) – u and log denotes the natural logarithm.
In probability theory, concentration inequalities provide bounds on how a random variable deviates from some value (typically, its expected value). The law of large numbers of classical probability theory states that sums of independent random variables are, under very mild conditions, close to their expectation with a large probability. Such sums are the most basic examples of random variables concentrated around their mean. Recent results show that such behavior is shared by other functions of independent random variables.
The lack of detection to date of electromagnetic technosignatures implies either that we have been unable to detect them due to incomplete sampling of the search space or that we cannot detect them because the Earth has been located during the entire histo ...
We derive confidence intervals (CIs) and confidence sequences (CSs) for the classical problem of estimating a bounded mean. Our approach generalizes and improves on the celebrated Chernoff method, yielding the best closed-form "empirical-Bernstein" CSs and ...
In this paper, we study the compression of a target two-layer neural network with N nodes into a compressed network with M < N nodes. More precisely, we consider the setting in which the weights of the target network are i.i.d. sub-Gaussian, and we minimiz ...