13 t-Tests for a Mean

Author

Dr. Devan Becker

Published

2023-07-10

13.1 When we use $s$, we use $t$

We’ve been over this in confidence intervals, and the same thing applies to hypothesis tests! If the population is normal (or the sample size is large enough) and we have an SRS, then \[ \frac{\bar X - \mu}{s/\sqrt{n}}\sim t_{n-1} \]

Again, the $t$ distribution is used to account for the extra variability from the estimated standard deviation.¹

This means our test statistic is \[ t_{obs} = \frac{\bar x - \mu}{s/\sqrt{n}} \]

Since this is a $t$ distibution, we use pt(t_obs, df = n -1), possibly one minus and/or double, depending on the alternate hypothesis.²

That’s it. That’s the big difference. When we estimate the standard deviation, we use the t-distribution.

The t-test for a population mean

Given a sample mean $\bar x$ and a sample standard deviation $s$, our test statistic is: \[ t_{obs} = \frac{\bar x - \mu}{s/\sqrt{n}} \] Our hypotheses and calculations of the p-value work the same as they did for the z-test.

13.2 Examples

Pilot Fatigue

In the pilot fatigue example from the Understanding p-values lecture, we assumed that we had the population sd. I lied - it was actually a sample statistic! We should have used a t-test, not a z test.

Recall:

$H_0: \mu = 15$ versus $H_A: \mu > 15$ with
$n = 16$, $\bar x = 15.9$, $s = 1.2$ (not $\sigma$) \[ t_{obs} = \frac{15.9 - 15}{1.2/\sqrt{16}} = 3 \]

Using the $t$ distribution, our p-value is:

1 - pt(3, df = 16 - 1)

[1] 0.004486369

This is larger than our previous p-value of 0.0013. This will always be the case: if the $z_{obs}$ test statistic is the same as the $t_{obs}$ test statistic, then the p-value for $t_{obs}$ will be wider.

p-values from a t-test are larger than a z-test (if you have $\sigma=s$)

We almost never know the population standard deviation, so we have extra uncertainty. With extra uncertainty, we require more evidence! Recall that a p-value is a measure of evidence against a null.

13.3 Matched Pairs

A matched pairs design allows us to use a one-sample t-test when it looks like we have two samples³. Since the pairs are matched, we can calculate the differences between pairs and treat this like a single vector of observations. It is honkey tonk ridonkulous to say that we know the true population standard deviation for the difference in observations, so a $z$ test could never be appropriate.

Consider the following example of a matched pairs experiment. Given a sample of brave volounteers, we create a small cut on both hands and put ointment on one of the two cuts⁴. This study design eliminates the variation in healing times for different people since both cuts are on the same person! For each individual, we observe a difference. That is, one observation per person!

	Subject 1	S2	S3	S4	S5	S6	S7	S8
With Ointment	6.44	6.06	4.22	3.3	6.5	3.49	7.01	4.22
Without	7.22	6.05	4.55	4	6.7	2.88	7.88	6.32
Difference	-0.78	0.01	-0.33	-0.7	-0.2	0.61	-0.87	-2.1

Note: Differences were calculated as “With minus Without”! This will be important for setting up the alternative hypothesis later.

The important thing here is that last row of this table now represents our data - we can forget that the other two rows exist! In other words, we have one observation per person, rather than two sets of observations.

This is where the assumption that we know the population standard deviation is especially preposterous: we’re looking just at the differences! Even if there’s a true value of the sd for healing time for all people, the standard deviation of the difference between healing times isn’t a reasonable quantity to speak of.

Since we’re looking at the difference, we no longer have a hypothesized value of $\mu_0$. Instead, we hypothesize that the average pairwise difference is 0, i.e. $\mu_{with\; minus\; without} = \mu_{diff} = 0$⁵. The alternative is “with” < “without”, i.e. $\mu_{diff} < 0$.⁶

x <- c(-0.78, 0.01, -0.33, -0.7, -0.2, 0.61, -0.87, -2.1)
xbar <- mean(x)
s <- sd(x)
n <- length(x)

t_obs <- (xbar - 0)/(s/sqrt(n)) # xbar is with - w/out
# Notice that we use pt() instead of pnorm()
pt(t_obs, df = n - 1) # Alternative is <

[1] 0.04662624

So our p-value is approximately 0.04. At the 5% level, the null hypothesis would be rejected and we would conclude that the ointment works⁷. At the 1% level, we would conclude that it doesn’t have a significant effect. This is why it’s important to know the significance level before calculating the p-value - we shouldn’t get to choose whether our results are statistically significant!

t-tests in Practice

Do you think that researchers in the field are typing test statistics into their calculator? Of course not! We’re finally at the point in this class where the methods are so commonly used that the built-in functions in R can calculate them.

with_oint <- c(6.44, 6.06, 4.22, 3.3, 6.5, 3.49, 7.0, 4.22)
without <- c(7.22, 6.05, 4.55, 4  , 6.7, 2.88, 7.8, 6.32)
difference <- with_oint - without
t.test(difference, alternative = "less")


    One Sample t-test

data:  difference
t = -1.9199, df = 7, p-value = 0.04817
alternative hypothesis: true mean is less than 0
95 percent confidence interval:
         -Inf -0.007063183
sample estimates:
mean of x 
 -0.53625

Notice that the output shows a one-sided confidence interval. This isn’t a big leap from what you know: a confidence interval consists of all of the values that would not be rejected by a hypothesis test, and this works for one-sided as well as two-sided alternate hypotheses!

To get a two-sided confidence interval, we can either leave alternative at it’s default value or set it to "two.sided". We can also change the significance level with the conf.level argument. For an 89%CI:

t.test(difference, alternative = "two.sided", conf.level = 0.89)


    One Sample t-test

data:  difference
t = -1.9199, df = 7, p-value = 0.09635
alternative hypothesis: true mean is not equal to 0
89 percent confidence interval:
 -1.04730428 -0.02519572
sample estimates:
mean of x 
 -0.53625

Notice that this calculated a two-sided p-value, which is twice what we saw before (and no longer significant at the 5% level!).

13.4 Recap

Hypothesis Tests in General

Decide on a hypothesis.
- $H_0: \mu = \mu_0$ versus $H_a: \mu [\ne,>,\text{ or }<] \mu_0$
Choose a significance level $\alpha$.
- Smaller leverl = require more evidence to reject the null.
Gather data
- Independent observations from same population; random sample.
Calculate the test statistic based on $\bar x$, $s$, and $\mu_0$.
- Sampling distribution is based on the null hypothesis.
Calculate the p-value according to the form of the alternate hypothesis.
- If $<$, then pnorm(z_obs); if $>$, then 1 - pnorm(z_obs); if two sided, double the correct one.

Hypothesis Test Example

New York is sometimes called “the city that never sleeps”. At the 5% level, do the following data provide evidence that the average New Yorker gets less than 8 hours of sleep per night?

$\bar x$	$s$	$n$
7.73	0.77	25

Hypotheses: $H_0: \mu = 8$, $H_a:\mu < 8$.
$t_{obs}$ = $\dfrac{7.73 - 8}{0.77/5} = -1.75$
p-value = pt(-1.75, 24) = 0.0464
Conclude: Since p < $\alpha$, we reject the null hypothesis.
- We have found statistically significant evidence that New Yorkers sleep less than 8 hours per night on average.

Confidence Intervals

Choose a confidence level $\alpha$.
- “100(1-$\alpha$)%CI
Collect data
- Independent observations from same population; random sample.
Find the critical value $t^*_{n-1}$
- We will not need $z^*$ again, except possibly as comparison.
Calculate $\bar x \pm t^*s/\sqrt{n}$
Conclude: 95% of the intervals constructed this way will contain the true population mean.

Confidence Interval Example

Construct a 95% CI for the New York sleep example.

$\bar x$	$s$	$n$
7.73	0.77	25

$\alpha$ = 0.05
$t^*_{n-1}$ = qt(0.025, 24) = -2.0639.
$\bar x \pm t^*_{n-1}s/\sqrt{n} = 7.73 \pm 2.0639*0.77/5 = (7.41, 8.05)$
Conclude: we are 95% confident that the true average nights sleep in New York is between 7.41 and 8.05.
- This interval includes 8, so 8 would not be rejected by a hypothesis test?!?!

13.5 Participation Questions

Q1

What is the standard error?

$\sigma/\sqrt{n}$
$\sqrt{\frac{p(1-p)}{n}}$
$\sqrt{s_1^2/n_1 + s_2^2/n}$
The standard deviation of the sampling distribution.

Q2

What is the standard deviation of the sampling distribution?

The standard deviation of the population divided by the square root of the sample size ($\sigma/\sqrt{n}$).
The standard deviation of the value of a sample statistic across all possible samples from the population.
The same as the standard deviation of the population.
The average distance to the mean of the population.

Q3

Why does the sampling distribution have a lower variance than the population?

Because the standard deviation is smaller than the variance.
Because the population has a larger number of possible, so the variance is smaller.
Because outliers are not as likely in a sample.
Because we are summarising many observations from a sample into a single value.

Q4

After conducting a study, we found a p-value of 0.04. Did we find a statistically significant result?

Yes, since the p-value is less than 0.05
No, since the p-value is less than 0.05
We failed to set the significance lavel ahead of time, so we have to be very careful about concluding significance.

Q5

After conducting a study, we found a 95% confidence interval for $\mu$ from -0.1 to 1.9. What can we conclude?

Since 0 is in the interval, a hypothesis test for $\mu = 0$ versus $\mu \ne 0$ would not be significant at the 5% level.
Since 0 is in the interval, a hypothesis test for $\mu = 0$ versus $\mu > 0$ would not be significant at the 5% level.
Since 0 is in the interval, a hypothesis test for $\mu = 0$ versus $\mu \ne 0$ would not be significant at the 2.5% level.
Since 0 is in the interval, a hypothesis test for $\mu = 0$ versus $\mu > 0$ would not be significant at the 2.5% level.

Q6

Under which condition does the CLT not apply?

For $\bar x$, the sample size is between 3 and 60 but a histogram of the sample appears normal.
For $\bar x$, the sample size is much larger than 60.
For $\hat p$, the sample size is much larger than 60.
For $\hat p$, we have checked $np>10$ and $n(1-p)>10$

Solution

41431

13.6 Self-Study Questions

Explain why: If the $z_{obs}$ test statistic is the same as the $t_{obs}$ test statistic, then the p-value for $t_{obs}$ will be wider.
If a test is statistically signficant, does that mean there’s a large effect size? That is, does a hypothesis test tell you anything about the size of the effect?
- Compare this to confidence intervals.
Can we interpret a $t$ confidence interval as “all null hypothesis values that would not be rejected”?
Re-do the ointment example, but using without - with.
1. Draw a t distribution and mark the two test statistics, then fill in the area that corresponds to the p-value.

13.7 Crowdsourced Questions

The following questions are added from the Winter 2024 section of ST231 at Wilfrid Laurier University. The students submitted questions for bonus marks, and I have included them here (with permission, and possibly minor modifications).

A researcher believes that the average sleep duration for adults in a certain city is less than the national average of 8 hours. To test this hypothesis, the researcher collects a sample of 50 adults from the city and finds that the mean sleep duration in the sample is 7.5 hours with a standard deviation of 1.2 hours. At a 5% significance level, can the researcher conclude that the average sleep duration for adults in the city is less than 8 hours?

Solution

Set up the hypotheses:

Null hypothesis (H0): The average sleep duration is 8 hours ($\mu = 8$).
Alternative hypothesis (H1): The average sleep duration is less than 8 hours ($\mu < 8$).

Calculate the test statistic using the sample mean, population mean, standard deviation, and sample size: $t_{obs} = = = -2.946

The p-value can be found using the t-distribution on $n-1$ degrees of freedom:

pt(-2.946, df = 50 - 1)

[1] 0.002457447

Since this is less than 0.05, we reject the null hypothesis. There is evidence at the 5% level that the true mean sleep duration is less than 8 hours.

A nutritionist claims that the new diet plan they have designed results in a more significant average weight loss than the generally accepted average of 5 pounds after a 4-week program. To validate this claim, the nutritionist collects data from 40 individuals who followed the diet plan and finds that the mean weight loss among the participants is 5.8 pounds with a standard deviation of 0.9 pounds. At a significance level of 1%, is there enough evidence to support the nutritionist’s claim?

Solution

Note that this is a matched pairs t-test!

Set up the hypotheses:

Null hypothesis (H0): The average weight loss is 5 pounds ($\mu = 5$).
Alternative hypothesis (H1): The average weight loss is more than 5 pounds ($\mu > 5$).

Calculate the test statistic using the formula for a t-test: \[ t_{obs} = \frac{\bar x - \mu_0}{s/\sqrt{n}} = \frac{5.8 - 5}{0.9/\sqrt{40}} = 5.62 \]

Using the t-distribution on $n - 1$ degrees of freedom:

1 - pt(5.62, df = 40 - 1)

[1] 8.727633e-07

We get a value much smaller than our significance level of 0.01! We reject the null, and conclude that the average weight loss is more than 5 pounds.

A coffee company has engineered an espresso machine that is advertised to make cups of espresso with an average of 80mg of caffeine. When being sampled, 36 cups of espresso were made and tested for their caffeine content. The sample mean came out to be 82mg with a standard deviation of 6mg. Conduct a 2-sided hypothesis test at a significance level of 5%. State Null hypothesis, Alternate Hypothesis, and significant level. Calculate test statistic and determine whether claims made by the coffee company should be accepted.

Solution

Our hypotheses are $H_0:\mu = 80$ versus $H_0:\mu \ne 80$.

\[ t_{obs} = \frac{\bar x - \mu_0}{s/\sqrt{n}} = \frac{82 - 80}{6 / \sqrt{36}} = 2 \]

The p-value can be found as:

2 * (1 - pt(2, df = 6 - 1))

[1] 0.1019395

Since our p-value is larger than alpha, we do not reject the null. We have not gathered evidence that the claim of 80mg is incorrect.

A clinical trial is conducted to compare the mean blood pressure reduction (in mmHg) achieved by a new medication against a standard treatment. The mean reduction for the new medication is calculated with a 95% confidence interval of (2, 8). Which of the following conclusions is most appropriate if testing the null hypothesis that the new medication does not differ from the standard treatment in terms of mean blood pressure reduction?
1. Reject the null hypothesis at the 5% significance level because the confidence interval does not include 0, indicating a significant difference in mean reductions.
2. Fail to reject the null hypothesis because the confidence interval includes the mean reduction of the standard treatment, indicating no significant difference.
3. Reject the null hypothesis only if the mean reduction for the standard treatment is outside the interval (2, 8), indicating a significant difference.
4. Accept the null hypothesis because confidence intervals are only useful for estimating the range of possible values, not for testing hypotheses.

In addition, what kind of test is this?

Solution

The correct answer is (a) Reject the null hypothesis at the 5% significance level because the confidence interval does not include 0, indicating a significant difference in mean reductions. The 95% confidence interval represents the range of values within which we are 95% confident the true mean difference lies. If the null hypothesis were true (no difference in mean reductions), we would expect the interval to include 0. However, since the interval (2, 8) does not include 0, we have evidence that the mean reduction from the new medication is significantly different from the standard treatment at the 5% significance level. This conclusion is based on the principle that if a 95% confidence interval for a difference does not include 0, the difference is statistically significant at the 5% level. Hence, the confidence interval directly informs the hypothesis test outcome.

This is a matched pairs test, since it’s looking at the same patient before and after treatment. Instead of a sample of people before treatment and a sample of people after treatment, the researchers can look at a single sample of the differences.

A psychology researcher is interested in the effects of a new therapy on reducing anxiety levels. To test the effectiveness of the therapy, the researcher selects a random sample of 15 patients and records their anxiety levels before and after undergoing the therapy. The differences in anxiety level for the patients are as follows:

Differences 3, 5, -1, 4, 6, 2, 7, 4, 3, 5, 2, 4, 6, 8, 3

Assuming the differences in anxiety levels are normally distributed, conduct a one-sample t-test to determine if the therapy leads to a significant reduction in anxiety levels at a 5% significance level.

Explain why this is a matched pairs test.
State the null and alternative hypotheses.
Calculate the test statistic.
Determine the critical t-value.
Conclude whether or not there is sufficient evidence to support that the therapy leads to a significant reduction in anxiety levels.

Solution

A.) The researchers have a sample of differences, meaning that each value comes from two observations on a natural pairing (e.g. the same individual). This can be done as a one-sample t-test.

B.) Null hypothesis ($H_0$): The therapy has no effect on anxiety levels, so the mean difference in anxiety levels is 0 ($\mu_d=0$).

Alternative hypothesis ($H_A$): The therapy leads to a reduction in anxiety levels, so the mean difference in anxiety levels is greater than 0 ($\mu_d >0$).

C.) The calculated test statistic (t-statistic) for the differences in anxiety levels is approximately 7.00.

D.) The critical t-value for a one-tailed test at a 5% significance level with 14 degrees of freedom is approximately 1.76.

E.) Since the calculated t-statistic (7.00) is greater than the critical t-value (1.76), we reject the null hypothesis. There is sufficient evidence to support that the therapy leads to a significant reduction in anxiety levels. This conclusion is based on the assumption that if the therapy had no effect, the likelihood of observing a sample mean difference as extreme as this, or more extreme, is very low under the null hypothesis. Therefore, the therapy appears to be effective in reducing anxiety levels among the patients in this study.

In R:

t.test(c(3, 5, -1, 4, 6, 2, 7, 4, 3, 5, 2, 4, 6, 8, 3), alternative = "greater")


    One Sample t-test

data:  c(3, 5, -1, 4, 6, 2, 7, 4, 3, 5, 2, 4, 6, 8, 3)
t = 6.9972, df = 14, p-value = 3.138e-06
alternative hypothesis: true mean is greater than 0
95 percent confidence interval:
 3.043017      Inf
sample estimates:
mean of x 
 4.066667

A small tech startup is interested in estimating the average number of hours its employees spend on professional development activities per month. Due to the startup’s limited size, a random sample of 10 employees is selected, and the following number of hours spent on professional development activities per month are recorded:

Hours: 12,15,9,11,14,8,10,13,12,7

Assuming the number of hours follows a normal distribution, calculate a 90% confidence interval for the average number of hours all employees at the startup spend on professional development activities per month.

Calculate the sample mean and the sample standard deviation.
Determine the critical t-value for a 90% confidence interval.
Construct the 90% confidence interval.
Interpret the confidence interval in the context of the study.

Solution

The sample mean of hours spent on professional development activities per month is 11.1 hours, and the sample standard deviation is approximately 2.60 hours.

B.) The critical t-value for constructing a 90% confidence interval with 9 degrees of freedom is approximately 1.83.

C.)The 90% confidence interval for the average number of hours all employees at the startup spend on professional development activities per month is approximately (9.59, 12.61) hours.

D.) Based on the sample data, we can be 90% confident that the true average number of hours spent on professional development activities by all employees at the startup falls between 9.59 and 12.61 hours per month

In R:

hours <- c(12, 15, 9, 11, 14, 8, 10, 13, 12, 7)
t.test(hours, conf.level = 0.9)


    One Sample t-test

data:  hours
t = 13.494, df = 9, p-value = 2.818e-07
alternative hypothesis: true mean is not equal to 0
90 percent confidence interval:
  9.592086 12.607914
sample estimates:
mean of x 
     11.1

Which is used in the caclulation of the Estimated Standard Error.↩︎
Like pnorm(), it always calculates the probability below the test statistic.↩︎
We’ll learn about two-sample t-tests in the next lecture.↩︎
And most likely a bandage on both.↩︎
In other words, the healing times are the same for each subject↩︎
This is where it’s important to know that we did “with minus without”; we could have done without minus with, but then our alternate hypotheses would need to be “>”.↩︎
A p-value says nothing about the effect size, so we can’t say whether it’s practically significant↩︎

13.1 When we use \(s\), we use \(t\)

13.2 Examples

Pilot Fatigue

13.3 Matched Pairs

t-tests in Practice

13.4 Recap

Hypothesis Tests in General

Hypothesis Test Example

Confidence Intervals

Confidence Interval Example

13.5 Participation Questions

Q1

Q2

Q3

Q4

Q5

Q6

13.6 Self-Study Questions

13.7 Crowdsourced Questions