Last summer I attended a conference on “Dynamics and Numbers” at the Max Planck Institute in Bonn. Vincent Delecroix gave a nice talk on large deviations for the Teichmuller flow, and I decided to learn a bit about this concept. I wrote the notes below at that time.

In this post, I will prove the simplest example of large deviations – Cramer’s theorem. Here is a simple example. Consider a random coin toss, and at each step you either win or loose dollar. After steps, consider the probability of having more than dollars, where is fixed. Of course, it goes to zero as , but how fast? Playing around with Stirling’s formula reveals that the probability decays exponentially fast.

The theory of large deviations is concerned with these kinds of exponentially unlikely events. We now move on to a more precise setup.

Let be a -valued random variable, with expectations and variance

Consider now a sequence of iid variables with this law and let . The Strong Law of Large Numbers and the Central Limit theorem say, respectively,

Here denotes the standard Gaussian distribution.

The theory of large deviations is concerned with estimating overshoots for some (or, by symmetry, undershoots). The general principle is that the probability is exponentially small, and

where is a function which reflects many properties of the distribution of .

** — 1. Some preliminaries — **

Definition 1The cumulant generating function of isWe assume it is finite for all ; this is a strong restriction on the moments of . To indicate dependence on , the cumulant will be denoted .

The next properties are immediate from the definition.

Proposition 2

- The cumulant satisfies
- The function is convex.
- The function is convex. This property requires a bit more work than others.
- Under shift and scaling of the variable, we have
- For two independent variables we have
- Given a random variable and a parameter , define the
Cramer transformto be the random variable with densityIn other words, the Cramer transform tilts the distribution of by an exponential weight. The relationship between cumulants is

- In particular, we have

For later use, we shall also need the following version of the Chebyshev inequality, also known as the “exponential Chebyshev inequality”. For any we have

** — 2. Large Deviations — **

Theorem 3 (Cramer)Let be an iid sequence with expectation and finite cumulant:Then for all we have

where is given by the Legendre transform

*Proof:* **Step 0: Simplifications.** Subtracting from the , the quantity decreases by , so we are reduced to showing

where

We also have . Let . There are three cases for the mass distribution of :

- . Then as and the claim follows ().
- and . Then and as . Then and the claim follows.
- . Then there exists such that
This follows since as and is convex.

**Step 1: Upper bound.** We now apply the exponential Chebyshev inequality:

So the conclusion follows. **Step 2: Lower bound.** Proving the reverse inequality is more delicate. We need to tilt the probability distribution of so that the large deviation event becomes typical. Do a Cramer transform with parameter , i.e. let be such that

Then we have that

Therefore the expectations and variance satisfy

Denoting by the partial sums for a sequence of iids, we have

It thus suffices to show that

But we have that , so we can apply the Central Limit Theorem to it. Thus, there exists such that

Therefore

Takings logs and dividing by , the claim follows.