<- 100
n <- 0.7
p <- sqrt(p*(1-p)/n)
SE_true <- c()
p_does <- c()
phat_does <- c()
that_does <- abs(qnorm(0.05/2))
z_star <- abs(qt(0.05/2, df = n-1)) t_star
16 Confidence Intervals for a Proportion
16.1 Introduction
This is the same as the last lesson
- Based on our data, we make an interval that we think describes the population.
- In this case, we just have a different population distribution?
Assumptions
In stats, assumptions give us power, but only if they’re good assumptions.
Assumptions for a CI for \(p\) are the same as the assumptions for the binomial distribution, with the addition of an SRS.
16.2 The CI for \(p\)
Sampling Distribution of \(\hat p\)
As we saw before the midterm, if the population is \(B(n,p)\), then under certain conditions,
\[\hat p \sim N\left(p, \sqrt{\frac{p(1-p)}{n}}\right)\]
Deja-Vu
Since \(\hat p \sim N(p, \sqrt{\frac{p(1-p)}{n}})\),
\[ \frac{\hat p - p}{\sqrt{p(1-p)/n}} \sim N(0,1) \]
Again, we can use the form \(z = (x-\mu)/\sigma\), but replace \(x\), \(\mu\), and \(\sigma\) with the correct values.
A \((1-\alpha)\)CI for \(p\) is:
\[ \hat p \pm z^*\sqrt{\frac{p(1-p)}{n}} \]
We don’t know the variance, why not \(t_{n-1}^*\)?
We used \(t_{n-1}^*\) because we had to estimate \(\sigma\)
There’s no \(\sigma\) to estimate!
The variance of the Binomial distribution is entirely determined by \(p\)!
- Binom be crazy.
… but Devan, we still don’t know \(p\)!
The \((1-\alpha)\)CI for \(p\) is:
\[ \hat p \pm z^*\sqrt{\frac{p(1-p)}{n}} \] which needs \(p\) in the second part of the equation.
Why not just plug in \(\hat p\)?
Okay fine.
\(\sqrt{\hat p(1-\hat p)/n}\) is called the estimated standard error, since its the sd of the sampling distribution, but it’s based on an estimate.
Final_Version_V2_Update_LastTry_Srsly.docx.pdf
The \((1-\alpha)\)CI for \(p\) is:
\[ \hat p \pm z^*\sqrt{\frac{\hat p(1-\hat p)}{n}} \]
where \(z^*\) is chosen such that \(P(Z < -z^*) = \alpha/2\).
Devan Style: Simulation
Devan Style: Simulation
for(i in 1:10000){
<- rbinom(n=1, size=n, prob=p)
new_sample <- new_sample/n
phat <- sqrt(phat*(1-phat)/n)
SE_est
<- phat + c(-1,1)*z_star*SE_true
pCI <- phat + c(-1,1)*z_star*SE_est
phatCI <- phat + c(-1,1)*t_star*SE_est
thatCI
<- pCI[1] < p & pCI[2] > p
p_does[i] <- phatCI[1] < p & phatCI[2] > p
phat_does[i] <- thatCI[1] < p & thatCI[2] > p
that_does[i] }
Simulation Results
mean(p_does)
[1] 0.9371
mean(phat_does)
[1] 0.9502
mean(that_does)
[1] 0.9502
Using the population proportion is… worse?
DIY: Change \(p\) so that the normal approximation doesn’t apply.
16.3 Examples and Cautions
Example 1
It was found that 591 out of 700 people sampled supported a certain political position. Find a 91%CI.
Since we have R, let’s use it!
Both prop.test()
and binom.test()
will give us a CI, with prop.test()
calculating an approximation using the normal distribution and binom.test()
calculating the exact value, without approximation. In general, you should always use binom.test()
for one sample proportions.
binom.test(x = 591, n = 700, conf.level = 0.91)
Exact binomial test
data: 591 and 700
number of successes = 591, number of trials = 700, p-value < 2.2e-16
alternative hypothesis: true probability of success is not equal to 0.5
91 percent confidence interval:
0.8192078 0.8670686
sample estimates:
probability of success
0.8442857
Example 2
It was found that 68 out of 70 people sampled supported a certain political position. Find a 91%CI.
<- 70
n <- 68/70
phat <- sqrt(phat*(1-phat)/n)
se_est <- abs(qnorm(0.09/2))
z_star
+ c(-1, 1)*z_star*se_est phat
[1] 0.9376692 1.0051879
… so it would be reasonable to say that the popluation proportion is larger than 1???
Absolutely not! The normal approximation does not apply here since \(n(1 - \hat p) = 70*(1 - 68/70) = 2\), and 2 is less than 101. The normal distribution can only be used when the sample size is large enough!!!
Instead, we can use the exact test. This is much slower to calculate, but for \(n = 70\) there’s no issue.
binom.test(x = 68, n = 70, conf.level = 0.91)
Exact binomial test
data: 68 and 70
number of successes = 68, number of trials = 70, p-value < 2.2e-16
alternative hypothesis: true probability of success is not equal to 0.5
91 percent confidence interval:
0.9108816 0.9951927
sample estimates:
probability of success
0.9714286
Notice that
Example 2
It was found that 68 out of 70 people sampled supported a certain political position. Find a 91%CI.
binom.test(x = 68, n = 70)
Exact binomial test
data: 68 and 70
number of successes = 68, number of trials = 70, p-value < 2.2e-16
alternative hypothesis: true probability of success is not equal to 0.5
95 percent confidence interval:
0.9005711 0.9965209
sample estimates:
probability of success
0.9714286
16.4 Crowdsourced Questions
The following questions are added from the Winter 2024 section of ST231 at Wilfrid Laurier University. The students submitted questions for bonus marks, and I have included them here (with permission, and possibly minor modifications).
- Assuming all other factors are held constant, how does increasing the number of trials from 100 to 400 affect the width of the confidence interval for a population proportion?
- The width of the confidence interval remains unchanged.
- The width of the confidence interval decreases, leading to a more precise estimate.
- The width of the confidence interval increases, indicating less precision in the estimate.
- The change in the width of the confidence interval cannot be determined without knowing the population size.
Solution
- Increasing the sample size from 100 to 400, while keeping all other factors constant, decreases the width of the confidence interval for a population proportion. This is because the standard error of the proportion decreases as the sample size increases, leading to a more precise estimate of the population proportion.
Citation needed.↩︎