The central limit theorem and the law of large numbers are perhaps the most important theorems in statistics. They are also some of the most far reaching, and apply in many situations.

Background

Suppose you have variables which come from some statistical distribution. This could be normal (continuous), Poisson (count), Exponential, it doesn’t really matter. What the law of large numbers says is that, if you take a sample of \(n\) of them independently (independence is important) so that you have \[ X_1,X_2,\ldots, X_n. \] Suppose also you know some important properties of this distribution, the mean and variance, which are \(\mu\) and \(\sigma^2\). These are known for many distributions. Then take the average of them \[ S_n=\frac{X_1+X_2+\ldots + X_n}{n}. \] Then, as \(n\) gets large enough, this number will get close to \(\mu\). So, if you want to get withing \(0.01\) of \(\mu\), once you make \(n\) “big”, this will happen.

n = 100
x <- rnorm(n,mean = 2.3)
mean(x)
## [1] 2.279121
n = 10000
x <- rnorm(n,mean = 2.3)
mean(x)
## [1] 2.305192
n = 100
x = rpois(n,lambda = 3)
mean(x)
## [1] 2.87
n = 10000
x = rpois(n,lambda = 3)
mean(x)
## [1] 3.0076

The Normal Case

If \(X_1,\ldots, X_n\) are from a normal, \(\mathcal{N}(\mu,\sigma^2)\) then it can be shown that the sum of normal distributions is exactly normal, and in particular that \[ S_n =\mathcal{N}\left(\mu,\frac{\sigma^2}{n}\right). \] Looking at this, as \(n\) grown then the variance of this normal will go to \(0\), it will be a normal with mean \(\mu\) and variance \(0\), i.e it will be \(\mu\) exactly.

More generally

Based on the normal case, if you just look at \(S_n\) as \(n\) grows large, this will just go to a point. But what if you wanted a distribution? If you have a distribution you can do a lot more things with it.

It turns out that if you multiply \(S_n\) correctly by some power of \(n\), it will indeed converge to a distribution. Can you guess which one?

The CLT

The CLT says that, given \(S_n\) as above, where \(S_n\) samples from almost any distribution, we have that

\[ \sqrt{n}\left(\frac{S_n - \mu}{\sigma}\right)\rightarrow \mathcal{N}(0,1), \] a standard normal.

To see this using simulations, what this says is that for \(n\) large enough, you can sample data from any distribution and compute \(S_n\), and \(S_n\) will be like a sample from a normal distribution (with some mean and variance). So if we were to get many samples for many \(S_n\) and look at a histogram, it should look approximately normal.

clt.example<-function(sample.size){
    sapply(1:1000, function(y){ mean( sample(population,sample.size,replace=T))} )
}


population<-rexp(100000)
hist(population)

hist(clt.example(1))

hist(clt.example(5))

hist(clt.example(10))

hist(clt.example(25))

hist(clt.example(250))

population<-rpois(100000,lambda = 3)
hist(population)

hist(clt.example(1))

hist(clt.example(5))

hist(clt.example(10))

hist(clt.example(25))

hist(clt.example(250))

So because we aren’t scaling as \(n\) grows these become more closely centered around the true mean.

We have some requirements that we need for this to work. We need the mean and variance of the variables to be finite. For some distributions this isn’t true.

population<-rcauchy(100000)
hist(population)

hist(clt.example(1))

hist(clt.example(5))

hist(clt.example(10))

hist(clt.example(25))

hist(clt.example(250))

Try find another statistical distribution with infinite mean or variance and demonstrate that this will not work. Another assumption we made is that each \(X_i\) is independent of all others. This is in fact not needed in general.