The central limit theorem (CLT) states that if we take a large enough number of samples from a population (or a large enough sample size from a population with a random outcome), the distribution of the sample means (the average values from each sample) will approximate a Normal Distribution, regardless of the original population distribution.

The phenomenon can be observed with the Galton Board.

The idea is the following:

  • Start with a random variable (like a die)
  • Add samples of this variable ()
  • The distribution of this sum looks more like a gaussian distribution as .

Or more formally:

which can be read like this: “If we take random variables, we compute how many std devs away form the mean is their sum, then the probability of this value being between two values and is equal to the area under the normal distribution curve between those same points

Three assumptions have to be true in order for this theorem to be applicable:

  1. All the must be independent from each other;
  2. Each is drawn from the same distribution;

These two assumptions, most of the times related as i.i.d (independent and identically distributed), can be relaxed. In the Galton Board these are not respected, but a normal distribution comes out anyway.

  1. The variance of the must be finite.

tags: statistics resources: