Chernoff Bound

Prerequisites

To fully understand this post, it is recommended that you have knowledge of the following:

Markov and Chebyshev Inequalities

Proof

Chernoff’s inequality has different forms for the lower-tail version and upper-tail version. In this post, we will introduce the proof process.

Lower-Tail Chernoff Bound

Let $X$ be the sum of $N$ independent random variables. Assume that these random variables follow the Bernoulli distribution and have a probability of $p_i$ to take a value of 1.

\[X = \sum_{i=1}^{N}X_i\]

For any $\delta\in (0, 1)$, the following inequality holds:

\[P(X \lt (1-\delta)E[X]) \lt e^{-E[X]\cdot \delta^2/2}\]

Here, $e$ represents Euler’s number.

(Proof)

First, let’s prove the following inequality:

\[P(X \lt (1-\delta) E[X]) \lt \left(\frac{e^{-\delta}}{(1-\delta)^{(1-\delta)}}\right)^{E[X]}\]

To prove this, let’s introduce an arbitrary parameter $t>0$. We will use this $t$ to transform the equation for $X$ into an equation for $e^{-tX}$. This method can be thought of as using the principle of moment generating functions to move the problem from the original domain of $X$, where it is difficult to solve, to the parameter domain of $t$, where it is relatively easy to solve.

To prove Equation (3), let’s modify the equation for Markov’s inequality in accordance with $e^{-tX}$ instead of $X$. The original Markov’s inequality is as follows:

\[P(X\lt \alpha) \leq \frac{E[X]}{\alpha}\]

If we change the right-hand side of the equation, we get:

\[P(X\lt\alpha) \leq \frac{E[e^{-tX}]}{e^{-t\alpha}}\]

When applying equation (5) to the left-hand side of equation (2), the result is as follows. Here, $\alpha = (1-\delta)E[X]$; thus,

\[\Rightarrow P(X\lt (1-\delta)E[X]) \leq \frac{E[e^{-tX}]}{e^{-t(1-\delta)E[X]}}\]

is established. Moreover, the events that constitute $X$, namely $X_i$, occur independently. Looking at the numerator on the right-hand side of the above equation, we have

\[E[e^{-tX}] = E[e^{-t\cdot\sum_{i}X_i}] = E[e^{-t(X_1+X_2+\cdots+X_N)}]\notag\] \[=E[e^{-tX_1}\cdot e^{-tX_2}\cdot e^{-tX_3}\cdot \cdots \cdot e^{-t X_N}]\]

which can be rewritten as follows, where the expected value of the product of independent random variables is the product of their expected values:

\[\Rightarrow E[e^{-tX_1}\cdot e^{-tX_2}\cdots e^{-tX_N}]=\prod_{i=1}^{N}E[e^{-tX_i}]\]

If we take a closer look at $E[e^{-tX_i}]$, we can see that it is the expected value of the transformation equation $e^{-tX_i}$ for the experiment $X_i$ following a Bernoulli distribution. Since $X_i$ has a probability of $(1-p_i)$ or $p_i$ of taking on the values 0 or 1, respectively, the expected value of $X_i$ is

\[E[X_i] = (1-p_i)\cdot 0 + p_i \cdot 1 = p_i\]

Similarly, the expected value of $e^{-tX_i}$ is

\[E[e^{-tX_i}]=(1-p_i)e^{-t\cdot 0}+p_i e^{-t\cdot 1}\notag\] \[=1-p_i + p_i e^{-t}=1+p_i(e^{-t}-1)\notag\] \[= 1+E[X_i](e^{-t}-1)\]

Considering that the last product is equal to the two leading terms of the Taylor series of $\exp(E[X_i]\cdot(e^{-t}-1))$, we can conclude that:

\[E[e^{-tX_i}]=1+E[X_i](e^{-t}-1) \lt e^{E[X_i](e^{-t}-1)}\]

Substituting equation (11) into equation (8), we have

\[\prod_{i=1}^{N}E[e^{-tX_i}]\lt\prod_{i=1}^{N}e^{E[X_i](e^{-t}-1)}\]

Thus, it follows that the right-hand side of the above equation can be rewritten as:

\[\prod_{i=1}^{N}\exp(E[X_i](e^{-t}-1))=\exp\left(\sum_{i=1}^{N}E[X_i]\cdot (e^{-t}-1)\right)\notag\] \[=\exp\left(E[X](e^{-t}-1)\right)\]

Hence, substituting the result of equation (13) into equation (6) yields the following equation.

\[P(X<(1-\delta)E[X]) \leq \frac{\exp\left(E[X](e^{-t}-1)\right)}{\exp\left(-t(1-\delta)E[X]\right)}\]

The given equation holds true for any $t>0$. Now, let’s find the value of $t=t^\ast$, which minimizes the equation (14) as tightly as possible. This process can be resolved by differentiating equation (14), and finding $t^\ast$ where the derivative is zero.

By applying the exponential law to the right-hand side of the equation (14), it can be written in one line as follows:

\[\exp(E[X](e^{-t}-1)+t(1-\delta)E[X])\]

Simplifying this a little more and naming it $f(t)$, we have:

\[f(t) = \exp\left(E[X]e^{-t}-E[X]+tE[X]-t\delta E[X]\right)\notag\] \[=\exp\left(E[X](e^{-t}+t-t\delta -1)\right)\]

Now, when $f(t)$ is differentiated with respect to $t$:

\[f'(t) = \exp\left(E[X](e^{-t}+t-t\delta -1)\right)(E[X])(e^{-t}+1-\delta)\]

It can be seen that the exponential function at the front of the equation (17) is always positive, and $E[X]$ is also positive. Therefore, $t=t^\ast$ can be found by making only the value inside the rightmost parentheses zero.

Thus,

\[e^{-t}+1-\delta = 0\]

Satisfying $t=t^\ast$ is:

\[t=t^\ast = \ln\left(\frac{1}{1-\delta}\right)\]

By substituting equation (19) into equation (14), we can obtain equation (3). Substituting equation (19) into equation (14), we get:

\[\Rightarrow P(X<(1-\delta) E[X]) \leq \frac{\exp\left(E[X]\left(e^{-\ln\left(1/(1-\delta)\right)}-1\right)\right)}{\exp\left(-\ln(1/(1-\delta))(1-\delta)E[X]\right)}\]

Looking only at the right-hand side of the above equation,

\[(RHS)\Rightarrow \frac{\exp(E[X](1-\delta -1))}{\exp(-(1-\delta)E[X]\ln\left(1/(1-\delta)\right))}\] \[=\frac{\exp(E[X](-\delta))}{\exp(\ln(1-\delta)^{(1-\delta E[X])})}=\frac{\exp(-\delta E[X])}{(1-\delta)^{(1-\delta)E[X]}}\] \[=\left(\frac{e^{-\delta}}{(1-\delta)^{(1-\delta)}}\right)^{E[X]}\]

Meanwhile,

\[\ln(1-x)=-x-\frac{x^2}{2}-\frac{x^3}{3}\cdots = -\sum_{i=1}^{N}\frac{x^n}{n}\]

Thus,

\[(1-\delta)\ln(1-\delta) = - (1-\delta)\delta - (1-\delta)\frac{\delta^2}{2}\cdots\notag\] \[=-\delta+\delta^2-\frac{\delta^2}{2}+\frac{\delta^3}{2}\cdots\notag\] \[=-\delta+\delta^2/2+\cdots\]

Therefore,

\[(1-\delta)\ln(1-\delta) \gt -\delta +\frac{\delta^2}{2}\]

holds, and by the property of logarithm, we can know that

\[(1-\delta)^{(1-\delta)}\gt\exp\left(-\delta + \frac{\delta^2}{2}\right)\]

Substituting Equation (27) into Equation (23), we get:

\[\left(\frac{e^{-\delta}}{(1-\delta)^{(1-\delta)}}\right)^{E[X]}\lt \left(\frac{e^{-\delta}}{e^{(-\delta+\delta^2/2)}}\right)^{E[X]}\] \[\Rightarrow \left(\frac{e^{-\delta}}{(1-\delta)^{(1-\delta)}}\right)^{E[X]}\lt \left(e^{-\delta^2/2}\right)^{E[X]}\]

Therefore, if we substitute this result into Equation (20) and Equation (23), we get:

\[\Rightarrow P(X\lt (1-\delta)E[X])\lt \exp(-E[X]\delta^2/2)\]

(Proof completed)

Upper-Tail Chernoff Bound

The proof for the upper-tail part is similar to the one for the lower-tail part, so it proceeds quickly, and the parts that we skip over quickly are referred to in the proof for the lower-tail part. For a random variable $X$ like Equation (1), if we select any $\delta \in (0, 1)$¹, the following holds:

\[P(X\gt(1+\delta)E[X]) \lt e^{-E[X]\cdot \delta^2/3}\]

Here, $e$ denotes Euler’s number.

(Proof)

As in the proof for the lower-tail part, the first step is to prove the following:

\[P(X\gt(1+\delta)E[X]) \lt \left(\frac{e^\delta}{(1+\delta)^{(1+\delta)}}\right)^{E[X]}\]

Introduce a parameter $t>0$ and change the function from $X$ to $e^tX$. We can obtain a similar equation as for the lower-tail part if we apply Markov’s inequality using the information on the left-hand side of Equation (32).

\[P(X\gt(1+\delta)E[X]) \leq \frac{E[e^{tX}]}{e^{t(1+\delta)E[X]}}\]

By separating the numerator of the right-hand side of Equation (33) using the definition of $X$ in Equation (1), we can obtain the following:

\[E[e^{tX}]=E[e^{t\sum_i X_i}]=E\left[\prod_{i=1}^{N}e^{tX_i}\right]=\prod_{i=1}^{N}E[e^{tX_i}]\]

On the other hand, since $X_i$ is a Bernoulli trial, equation (9) holds, and the expected value of $e^{tX_i}$ is as follows.

\[E[e^{tX_i}]=(1-p_i)e^{t\cdot 0}+p_i e^{t\cdot 1}=1-p_i+p_ie^t\notag\] \[=1+p_i(e^t-1)=1+E[X_i](e^t-1)\]

Also, the last equation of equation (35) coincides with the first two terms of the Taylor series of $\exp(1+E[X_i]\cdot(e^t-1))$, that is,

\[\exp(x) = 1+\frac{x}{1!}+\frac{x^2}{2}+\cdots\]

Therefore, $\exp(E[X_i]\cdot(e^t-1))$ is

\[\exp(E[X_i]\cdot(e^t-1))=1+E[X_i](e^t-1)+\frac{1}{2!}(E[X_i](e^t-1))^2+\cdots \notag\] \[\gt 1+E[X_i](e^t-1)\]

Thus, we can arrange the final result in equation (34) as follows.

\[\prod_{i=1}^{N}E[e^{tX_i}]=\prod_{i=1}^{N}(1+E[X](e^t-1))<\prod_{i=1}^{N}\exp(E[X_i](e^t-1))\]

Meanwhile,

\[\prod_{i=1}^{N}\exp(E[X_i](e^t-1))=\exp\left(E\left[\sum_{i=1}^{N}X_i\right](e^t-1)\right)=\exp(E[X](e^t-1))\]

If we substitute the above result into equation (33) from above, we get

\[P(X\gt(1+\delta)E[X])\leq \frac{\exp(E[X](e^t-1))}{e^{t(1+\delta)E[X]}}\]

As in the lower-tail boundary, if we differentiate the right-hand side of equation (40), find the most tight value of $t=t^\ast$ that makes the coefficient of differentiation equal to 0, then we get the following.

\[t^\ast=\ln(1+\delta)\]

Substituting equation (41) into equation (40) gives us equation (32).

\[P(X\gt(1+\delta)E[X]) \leq \frac{\exp(E[X](e^{\ln(1+\delta)}-1))}{\exp((1+\delta)E[X]\ln(1+\delta))}\] \[\Rightarrow P(X\gt(1+\delta)E[X]) \leq \frac{\exp(E[X]\delta)}{(1+\delta)^{(1+\delta)E[X]}}\] \[\Rightarrow P(X\gt(1+\delta)E[X]) \leq \left(\frac{e^\delta}{(1+\delta)^{(1+\delta)}}\right)^{E[X]}\]

To prove equation (31) from equation (44), we just need to verify whether the following equation is true.

\[\frac{e^\delta}{(1+\delta)^{(1+\delta)}}<e^{-\mu^2/3}\]

Taking the logarithm of both sides of equation (45), we can obtain the following equation.

\[f(\delta) = \delta - (1+\delta)\ln (1+\delta) + \frac{\delta^2}{3}< 0\]

The differential coefficient of $f(\delta)$ is

\[f'(\delta) = 1-\frac{1+\delta}{1+\delta}-\ln(1+\delta)+\frac{2}{3}\delta = -\ln(1+\delta)+\frac{2}{3}\delta\] \[f''(\delta) = - \frac{1}{1+\delta}+\frac{2}{3}\]

From the second differential coefficient, we can know that

\[\begin{cases}f''(\delta)\lt 0\text{ for } 0\leq \delta \lt 1/2 \\ f''(\delta) > 0 \text{ for } \delta >1/2\end{cases}\]

In other words, $f’(\delta)$ decreases at first and then increases on the interval $(0,1)$. Also, from the expression of the first differential coefficient, we can see that $f’(0)=0$ and $f’(1)\lt 0$. Therefore, we know that $f’(\delta)\lt 0$ on the interval $(0,1)$. Finally, since $f(0)=0$, we can see that $f(\delta)$ is always negative on the interval $(0,1)$.

Therefore, we can see that equation (46) is true and so is equation (31).

(End of proof)

Reference

Outlier Analysis (2nd e.d), Charu C. Aggarwal, Springer
Probability and Computing (2nd e.d.), Michael Mitzenmacher and Eli Upfal, Cambridge University Press

The book ‘Outlier Analysis’ (Charu C. Aggarwal) shows the Chernoff Bound that holds in the range (0, 2e-1), but I still don’t know how to prove it. Hence, I introduce the Chernoff Bound for the (0, 1) bound that is presented in other textbooks. ↩

PREVIOUSMarkov Inequality and CHebyshev Inequality

NEXTStatistical Inference - Big Picture