# When Can We Approximate the Binomial Distribution with a Normal Distribution?

The purpose of this activity is to determine when it is permissable to approximate a binomial distribution with a normal distribution. Aliaga claims that it is permissable to approximate a binomial distribution with a normal distribution if and only if $$np\ge 5$$ and $$nq\ge 5$$. Let’s begin by reminding our readers of the binomial parameters and their meanings.

\begin{align*} n&=\text{Number of trials}\\ p&=\text{Probability of success}\\ q&=\text{Probability of failure} \end{align*}

We then let the variable $$X$$ represent the number of successes in $$n$$ trials (for example, the number of heads in 10 tosses).

Further, if $$X$$ is a binomial random variable, where $$n$$ is the number of trials, $$p$$ is the probability of success on any trial, and $$q$$ is the probability of failure on any one trial, then the mean of the binomial distribution is $\mu=np,$ the variance is $\sigma^2=npq,$ and the standard deviation is $\sigma=\sqrt{npq}.$

Now, we’ll examine the Aliaga rule ($$np\ge 5$$ and $$nq\ge 5$$) for a number of examples.

### Example: Fails Aliaga Rule

For our first example, we’ll let the number of trials be $$n=10$$ and the probability of success on any one trial be $$p=0.1$$. Enter $$n=10$$, $$p=0.1$$, and $$q=1-p=0.9$$.

n=10
p=0.1
q=1-p

Because there are $$n=10$$ possible trials, the variable $$X$$, which represents the number of successes in $$n=10$$ trials, can take on each of the numbers 0, 1, 2, …, 10. We can calculate the probability $$Pr(X=k)$$ for $$k=0, 1, 2, ... ,10$$ using R’s dbinom(k,n,p) command.

x=0:10
y=dbinom(x,n,p)

We can now create a stickplot.

plot(x,y,type="h",lwd=2,col="red")

Clearly, this distribution is not normal in shape as it is highly skewed to the right. Moreover, when we apply the Aliaga test:

\begin{align*} np&=(10)(0.1)=1\\ nq&=(10)(0.9)=9 \end{align*}

It is not the case that both $$np\ge 5$$ and $$nq\ge 5$$. However, let’s try to add an appropriate normal curve to our plot. We first calculate the mean and standard deviation of the binomial distribution using $$\mu=np$$ and $$\sigma=\sqrt{npq}$$.

mu=n*p
s=sqrt(n*p*q)

We will now redraw our binomial distribution, then add a normal distribution with mean mu and standard deviation s.

plot(x,y,type="h",lwd=2,col="red")
xx=seq(0,10,length=200)
yy=dnorm(xx,mu,s)
lines(xx,yy,lwd=2,col="blue")

As evidenced in our last plot, the normal curve is not a good fit. Indeed, part of the normal curve is not even present in the picture.

Let’s now find the exact probability $$Pr(X\le1)$$ using R’s pbinom(x,n,p) command.

pbinom(1,n,p)
## [1] 0.7360989

Now let’s see what normal approximation of the probability $$Pr(X\le 1)$$ is using R’s pnorm(x,mu,sd) command.

pnorm(1,mu,s)
## [1] 0.5

Note that the approximation is not even close to the exact probability. Hence, because this example fails the Aliaga test ($$np\ge 5$$ and $$nq\ge 5$$), we cannot approximate this binomial distribution with a normal distribution.

### Example: Barely Passes Aliaga Rule

Let’s look at a second example. This time we’ll let the number of trials equal $$n=10$$ again, but raise the probability of success on any individual trial to $$p=0.5$$. Note that this time, both portions of Aliaga’s rule ($$np\ge 5$$ and $$nq\ge 5$$) are satisfied.

\begin{align*} np&=(10)(0.5)=5\\ nq&=(10)(0.5)=5 \end{align*}

Now, let’s see how things work this time. Enter $$n=10$$, $$p=0.5$$, and $$q=1-p=0.5$$.

n=10
p=0.5
q=0.5

Because there are $$n=10$$ possible trials, the variable $$X$$, which represents the number of successes in $$n=10$$ trials, can take on each of the numbers 0, 1, 2, …, 10. We can calculate the probability $$Pr(X=k)$$ for $$k=0, 1, 2, ... ,10$$ using R’s dbinom(k,n,p) command.

x=0:10
y=dbinom(x,n,p)

We can now create a stickplot.

plot(x,y,type="h",lwd=2,col="red")

This time, the binomial distribution has a normal shape. We will now redraw our binomial distribution, then add a normal distribution with mean $$\mu=np$$ and standard deviation $$\sigma=\sqrt{npq}$$, which we represent with the variables mu and s.

plot(x,y,type="h",lwd=2,col="red")
mu=n*p
s=sqrt(n*p*q)
xx=seq(0,10,length=200)
yy=dnorm(xx,mu,s)
lines(xx,yy,lwd=2,col="blue")