16 Confidence Intervals for a Proportion

Author

Devan Becker

Published

2021-03-07

16.1 Introduction

This is the same as the last lesson

Based on our data, we make an interval that we think describes the population.
In this case, we just have a different population distribution?

Assumptions

In stats, assumptions give us power, but only if they’re good assumptions.

Assumptions for a CI for \(p\) are the same as the assumptions for the binomial distribution, with the addition of an SRS.

16.2 The CI for \(p\)

Sampling Distribution of \(\hat p\)

As we saw before the midterm, if the population is \(B(n,p)\), then under certain conditions,

\[\hat p \sim N\left(p, \sqrt{\frac{p(1-p)}{n}}\right)\]

Deja-Vu

Since \(\hat p \sim N(p, \sqrt{\frac{p(1-p)}{n}})\),

\[ \frac{\hat p - p}{\sqrt{p(1-p)/n}} \sim N(0,1) \]

Again, we can use the form \(z = (x-\mu)/\sigma\), but replace \(x\), \(\mu\), and \(\sigma\) with the correct values.

A \((1-\alpha)\)CI for \(p\) is:

\[ \hat p \pm z^*\sqrt{\frac{p(1-p)}{n}} \]

We don’t know the variance, why not \(t_{n-1}^*\)?

We used \(t_{n-1}^*\) because we had to estimate \(\sigma\)
There’s no \(\sigma\) to estimate!
The variance of the Binomial distribution is entirely determined by \(p\)!
- Binom be crazy.

… but Devan, we still don’t know \(p\)!

The \((1-\alpha)\)CI for \(p\) is:

\[ \hat p \pm z^*\sqrt{\frac{p(1-p)}{n}} \] which needs \(p\) in the second part of the equation.

Why not just plug in \(\hat p\)?

Okay fine.

\(\sqrt{\hat p(1-\hat p)/n}\) is called the estimated standard error, since its the sd of the sampling distribution, but it’s based on an estimate.

Final_Version_V2_Update_LastTry_Srsly.docx.pdf

The \((1-\alpha)\)CI for \(p\) is:

\[ \hat p \pm z^*\sqrt{\frac{\hat p(1-\hat p)}{n}} \]

where \(z^*\) is chosen such that \(P(Z < -z^*) = \alpha/2\).

Devan Style: Simulation

n <- 100
p <- 0.7
SE_true <- sqrt(p*(1-p)/n)
p_does <- c()
phat_does <- c()
that_does <- c()
z_star <- abs(qnorm(0.05/2))
t_star <- abs(qt(0.05/2, df = n-1))

Devan Style: Simulation

for(i in 1:10000){
    new_sample <- rbinom(n=1, size=n, prob=p)
    phat <- new_sample/n
    SE_est <- sqrt(phat*(1-phat)/n)
    
    pCI <- phat + c(-1,1)*z_star*SE_true
    phatCI <- phat + c(-1,1)*z_star*SE_est
    thatCI <- phat + c(-1,1)*t_star*SE_est
    
    p_does[i] <- pCI[1] < p & pCI[2] > p
    phat_does[i] <- phatCI[1] < p & phatCI[2] > p
    that_does[i] <- thatCI[1] < p & thatCI[2] > p
}

Simulation Results

mean(p_does)

[1] 0.9371

mean(phat_does)

[1] 0.9502

mean(that_does)

[1] 0.9502

Using the population proportion is… worse?

DIY: Change \(p\) so that the normal approximation doesn’t apply.

16.3 Examples and Cautions

Example 1

It was found that 591 out of 700 people sampled supported a certain political position. Find a 91%CI.

Since we have R, let’s use it!

Both prop.test() and binom.test() will give us a CI, with prop.test() calculating an approximation using the normal distribution and binom.test() calculating the exact value, without approximation. In general, you should always use binom.test() for one sample proportions.

binom.test(x = 591, n = 700, conf.level = 0.91)


    Exact binomial test

data:  591 and 700
number of successes = 591, number of trials = 700, p-value < 2.2e-16
alternative hypothesis: true probability of success is not equal to 0.5
91 percent confidence interval:
 0.8192078 0.8670686
sample estimates:
probability of success 
             0.8442857

Example 2

It was found that 68 out of 70 people sampled supported a certain political position. Find a 91%CI.

n <- 70
phat <- 68/70
se_est <- sqrt(phat*(1-phat)/n)
z_star <- abs(qnorm(0.09/2))

phat + c(-1, 1)*z_star*se_est

[1] 0.9376692 1.0051879

… so it would be reasonable to say that the popluation proportion is larger than 1???

Absolutely not! The normal approximation does not apply here since \(n(1 - \hat p) = 70*(1 - 68/70) = 2\), and 2 is less than 10¹. The normal distribution can only be used when the sample size is large enough!!!

Instead, we can use the exact test. This is much slower to calculate, but for \(n = 70\) there’s no issue.

binom.test(x = 68, n = 70, conf.level = 0.91)


    Exact binomial test

data:  68 and 70
number of successes = 68, number of trials = 70, p-value < 2.2e-16
alternative hypothesis: true probability of success is not equal to 0.5
91 percent confidence interval:
 0.9108816 0.9951927
sample estimates:
probability of success 
             0.9714286

Notice that

Example 2

It was found that 68 out of 70 people sampled supported a certain political position. Find a 91%CI.

binom.test(x = 68, n = 70)


    Exact binomial test

data:  68 and 70
number of successes = 68, number of trials = 70, p-value < 2.2e-16
alternative hypothesis: true probability of success is not equal to 0.5
95 percent confidence interval:
 0.9005711 0.9965209
sample estimates:
probability of success 
             0.9714286

16.4 Crowdsourced Questions

The following questions are added from the Winter 2024 section of ST231 at Wilfrid Laurier University. The students submitted questions for bonus marks, and I have included them here (with permission, and possibly minor modifications).

Assuming all other factors are held constant, how does increasing the number of trials from 100 to 400 affect the width of the confidence interval for a population proportion?
1. The width of the confidence interval remains unchanged.
2. The width of the confidence interval decreases, leading to a more precise estimate.
3. The width of the confidence interval increases, indicating less precision in the estimate.
4. The change in the width of the confidence interval cannot be determined without knowing the population size.

Solution

Increasing the sample size from 100 to 400, while keeping all other factors constant, decreases the width of the confidence interval for a population proportion. This is because the standard error of the proportion decreases as the sample size increases, leading to a more precise estimate of the population proportion.

Citation needed.↩︎