Large Deviations – Cramer’s theorem

Last summer I attended a conference on “Dynamics and Numbers” at the Max Planck Institute in Bonn. Vincent Delecroix gave a nice talk on large deviations for the Teichmuller flow, and I decided to learn a bit about this concept. I wrote the notes below at that time.

In this post, I will prove the simplest example of large deviations – Cramer’s theorem. Here is a simple example. Consider a random coin toss, and at each step you either win or loose {1} dollar. After {N} steps, consider the probability of having more than {aN} dollars, where {a>0} is fixed. Of course, it goes to zero as {N{\rightarrow} \infty}, but how fast? Playing around with Stirling’s formula reveals that the probability decays exponentially fast.

The theory of large deviations is concerned with these kinds of exponentially unlikely events. We now move on to a more precise setup.

Let {X} be a {{\mathbb R}}-valued random variable, with expectations and variance

\displaystyle  \begin{array}{rcl}  {\mathbb E}[X]&=\mu\\ {\mathbb E}[(X-\mu)^2] &= \sigma^2 \end{array}

Consider now a sequence of iid variables {X_i} with this law and let {S_n= X_1+\cdots+X_n}. The Strong Law of Large Numbers and the Central Limit theorem say, respectively,

\displaystyle  \begin{array}{rcl}  \frac{S_n}n {\rightarrow} \mu & \hskip 1cm {\mathbb P}\text{-a.s.}\\ \frac{S_n-\mu n}{\sqrt{n}\sigma} {\rightarrow} {\mathcal N} & \hskip 1cm\text{ in law.} \end{array}

Here {{\mathcal N}} denotes the standard Gaussian distribution.

The theory of large deviations is concerned with estimating overshoots {{\mathbb P}(S_n\geq an)} for some {a>\mu} (or, by symmetry, undershoots). The general principle is that the probability is exponentially small, and

\displaystyle  \frac 1n \log {\mathbb P}(S_n\geq an) {\rightarrow} -\log I(a)

where {I(a)} is a function which reflects many properties of the distribution of {X}.

— 1. Some preliminaries —

Definition 1 The cumulant generating function of {X} is

\displaystyle  \phi(t):={\mathbb E}[e^{Xt}]

We assume it is finite for all {t\in {\mathbb R}}; this is a strong restriction on the moments of {X}. To indicate dependence on {X}, the cumulant will be denoted {\phi_X(t)}.

The next properties are immediate from the definition.

Proposition 2

  • The cumulant satisfies

    \displaystyle  \begin{array}{rcl}  \phi(0) &= 1 \\ \phi'(0) & = {\mathbb E}[X]\\ \phi''(0) & = {\mathbb E}[X^2]\\ \vdots & \end{array}

  • The function {\phi(t)} is convex.
  • The function {\log \phi(t)} is convex. This property requires a bit more work than others.
  • Under shift and scaling of the variable, we have

    \displaystyle  \begin{array}{rcl}  \phi_{\lambda X }(t) &= \phi_X(\lambda t)\\ \phi_{X+a}(t) & = \phi_X(t)e^{at} \end{array}

  • For two independent variables {X,Y} we have

    \displaystyle  \phi_{X+Y}(t) = \phi_X(t)\phi_Y(t)

  • Given a random variable {X} and a parameter {\tau}, define the Cramer transform {\hat{X}} to be the random variable with density

    \displaystyle  {\mathbb P}(\hat{X}\in db) = \frac 1 {\phi(\tau)} e^{\tau b} {\mathbb P}(X\in db)

    In other words, the Cramer transform tilts the distribution of {X} by an exponential weight. The relationship between cumulants is

    \displaystyle  \phi_{\hat{X}}(t) = \frac 1 {\phi_X(\tau)} \phi_X(t+\tau)

  • In particular, we have

    \displaystyle  \begin{array}{rcl}  {\mathbb E}[\hat{X}] &= \phi_{\hat{X}}'(0) = \frac{\phi_X '(\tau)}{\phi_X(\tau)}\\ {\mathbb E}[\hat{X}^2] & = \phi_{\hat{X}}''(0) = \frac{\phi_X '' (\tau)}{\phi_X(\tau)} \end{array}

For later use, we shall also need the following version of the Chebyshev inequality, also known as the “exponential Chebyshev inequality”. For any {\tau>0} we have

\displaystyle  {\mathbb P}(X\geq 0) = {\mathbb P}(\tau X \geq 0) = {\mathbb P}(e^{\tau X}\geq 1) \leq {\mathbb E}[e^{\tau X}]

— 2. Large Deviations —

Theorem 3 (Cramer) Let {X_i} be an iid sequence with expectation {\mu} and finite cumulant:

\displaystyle  \phi(t):= {\mathbb E}[e^{tX}] < \infty \text{ for all } t\in {\mathbb R}.

Then for all {a\geq \mu} we have

\displaystyle  \frac 1n \log {\mathbb P}(S_n \geq an) {\rightarrow} -I(a)

where {I(a)} is given by the Legendre transform

\displaystyle  I(z):= \sup_t [tz - \log \phi (t)]

Proof: Step 0: Simplifications. Subtracting {a} from the {X_i}, the quantity {\log \phi} decreases by {a}, so we are reduced to showing

\displaystyle  \lim_n \frac 1n \log {\mathbb P} (S_n\geq 0) = -I(0)

where

\displaystyle  I(0) = \sup_t [-\log \phi(t)] = -\log [\inf_t \phi(t)]

We also have {{\mathbb E}[X]\leq 0}. Let {\rho:= \inf_t \phi(t)}. There are three cases for the mass distribution of {X}:

  • {{\mathbb P}(X<0)=1}. Then {\phi(t){\rightarrow} 0=\rho} as {t{\rightarrow} \infty} and the claim follows ({I=+\infty}).
  • {{\mathbb P}(X\leq 0)=1} and {{\mathbb P}(X=0)>0}. Then {0<\rho={\mathbb P}(X=0)} and {\phi(t){\rightarrow} \rho} as {t{\rightarrow} \infty}. Then {{\mathbb P}(S_n\geq 0 ) =\rho^n} and the claim follows.
  • {{\mathbb P}(X>0)>0}. Then there exists {\tau>0} such that

    \displaystyle  \phi(\tau)=\rho=\inf_t \phi(t)

    This follows since {\phi(t){\rightarrow} +\infty} as {t{\rightarrow} \pm \infty} and {\phi} is convex.

Step 1: Upper bound. We now apply the exponential Chebyshev inequality:

\displaystyle  \begin{array}{rcl}  {\mathbb P}(S_n\geq 0) & = {\mathbb P}(\tau S_n \geq 0) = {\mathbb P}(e^{\tau S_n}\geq 1)\leq \\ & \leq {\mathbb E}[e^{\tau S_n}]= \phi(\tau)^n = \rho^n \end{array}

So the conclusion follows. Step 2: Lower bound. Proving the reverse inequality is more delicate. We need to tilt the probability distribution of {X} so that the large deviation event becomes typical. Do a Cramer transform with parameter {\tau}, i.e. let {\hat{X}} be such that

\displaystyle  {\mathbb P}(\hat{X}\in db) = \frac{e^{\tau b}}{\phi(\tau)} {\mathbb P}(X\in db) = \frac 1\rho e^{\tau b} {\mathbb P}(X\in db)

Then we have that

\displaystyle  \phi_{\hat{X}} (t) = \frac{\phi(\tau+t)}{\phi(\tau)}

Therefore the expectations and variance satisfy

\displaystyle  \begin{array}{rcl}  {\mathbb E}[\hat{X}] &= \frac{\phi_{\hat{X}}'(0)}{\phi_X(0)} = 0\\ {\mathbb E}[\hat{X}^2] &= \frac {\phi_X''(0)}{\phi_X(0)}>0 \end{array}

Denoting by {\hat{S}_n} the partial sums for a sequence of {\hat{X}} iids, we have

\displaystyle  {\mathbb P}(S_n\geq 0) = {\mathbb E}[{1}_{\hat{S}_n \geq 0} \cdot \rho^n e^{-\tau \hat{S}_n}]

It thus suffices to show that

\displaystyle  \frac 1n \log {\mathbb E}[{1}_{\hat{S}_n\geq 0} e^{-\tau \hat{S}_n }] \geq o(1)

But we have that {Var(\hat{X})>0}, so we can apply the Central Limit Theorem to it. Thus, there exists {C>0} such that

\displaystyle  {\mathbb P}(S_n \in [0, C\hat{\sigma}\sqrt{n} ]) >\frac 1 {10}

Therefore

\displaystyle  {\mathbb E}[1_{\hat{S}_n\geq 0 } e^{-\tau \hat{S}_n }] \geq \frac 1 {10} e^{-\tau C\hat{\sigma}\sqrt n }

Takings logs and dividing by {n}, the claim follows. \Box

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s