Last summer I attended a conference on “Dynamics and Numbers” at the Max Planck Institute in Bonn. Vincent Delecroix gave a nice talk on large deviations for the Teichmuller flow, and I decided to learn a bit about this concept. I wrote the notes below at that time.
In this post, I will prove the simplest example of large deviations – Cramer’s theorem. Here is a simple example. Consider a random coin toss, and at each step you either win or loose dollar. After steps, consider the probability of having more than dollars, where is fixed. Of course, it goes to zero as , but how fast? Playing around with Stirling’s formula reveals that the probability decays exponentially fast.
The theory of large deviations is concerned with these kinds of exponentially unlikely events. We now move on to a more precise setup.
Let be a -valued random variable, with expectations and variance
Consider now a sequence of iid variables with this law and let . The Strong Law of Large Numbers and the Central Limit theorem say, respectively,
Here denotes the standard Gaussian distribution.
The theory of large deviations is concerned with estimating overshoots for some (or, by symmetry, undershoots). The general principle is that the probability is exponentially small, and
where is a function which reflects many properties of the distribution of .
— 1. Some preliminaries —
Definition 1 The cumulant generating function of is
We assume it is finite for all ; this is a strong restriction on the moments of . To indicate dependence on , the cumulant will be denoted .
The next properties are immediate from the definition.
- The cumulant satisfies
- The function is convex.
- The function is convex. This property requires a bit more work than others.
- Under shift and scaling of the variable, we have
- For two independent variables we have
- Given a random variable and a parameter , define the Cramer transform to be the random variable with density
In other words, the Cramer transform tilts the distribution of by an exponential weight. The relationship between cumulants is
- In particular, we have
For later use, we shall also need the following version of the Chebyshev inequality, also known as the “exponential Chebyshev inequality”. For any we have
— 2. Large Deviations —
Theorem 3 (Cramer) Let be an iid sequence with expectation and finite cumulant:
Then for all we have
where is given by the Legendre transform
Proof: Step 0: Simplifications. Subtracting from the , the quantity decreases by , so we are reduced to showing
We also have . Let . There are three cases for the mass distribution of :
- . Then as and the claim follows ().
- and . Then and as . Then and the claim follows.
- . Then there exists such that
This follows since as and is convex.
Step 1: Upper bound. We now apply the exponential Chebyshev inequality:
So the conclusion follows. Step 2: Lower bound. Proving the reverse inequality is more delicate. We need to tilt the probability distribution of so that the large deviation event becomes typical. Do a Cramer transform with parameter , i.e. let be such that
Then we have that
Therefore the expectations and variance satisfy
Denoting by the partial sums for a sequence of iids, we have
It thus suffices to show that
But we have that , so we can apply the Central Limit Theorem to it. Thus, there exists such that
Takings logs and dividing by , the claim follows.