8 Two Sample Inference (Continuous)
8.1 Matched Pairs
Definition: Each observation from one sample is clearly, definitively, without question paired with an observation from another sample.
Examples:
- For each patient, make an identical small cut on each hand. For one of the cuts, use an antibacterial ointment. Measure the amount of time it takes each cut to heal.
- Cuts are on the same person - they are unambiguously linked!
- This study design accounts for people who heal at different rates or who are exposed to more germs - the only different between healing times on one person is the ointment.
- Weight loss studies.
- The before and after weight are, without a doubt, paried because they are measured on the same person.
The following are not examples of matched pairs:
- The weights of women versus the weights of men.
- There’s no natural pairing of a particular woman in the women group with a particular man in the men group.
- If there were, say, spousal pairs, then a matched pairs might make sense.
- The medical outcomes of a treatment group compared to a control group (e.g., placebo).
- Important exception: if the control group was chosen specifically to match the treatment group, known as a case-control study.
With matched pairs, we have two options:
- calculate the mean of the “before” weights and the mean of the “after” weights and compare these.
- Calculate the mean of the differences and treat this like a single sample.
If you have two samples, and every observation in one sample is obviously paired with a single other observation, then it would be silly to not use this information.
Pain Reduction Example
These data are the before and after of a pain reduction pill.
We have three histograms and a boxplot. The top row are the before and after data, with the boxplot showing that there appears to be a difference.
We could do a two sample t-test on these data, but each before is clearly, unabmiguously, obviously, seriously, without a doubt paired with another. In order to do a valid test, we must account for all of the information we have available.
A matched pairs test is just a one-sample t-test for the differences. We ignore before and after, and just focus on pain_reduc.
Since we calculated pain_reduc as before - after, we have the following hypotheses. Let \(\mu_{b-a}\) be the true population mean for before minus after.
\[ H_0: \mu_{b-a} = 0\text{ versus }H_a: \mu_{b-a} < 0 \]
Explain why the alt is “< 0”.
By hand:
And we can find where r t_obs is on the t-table.
The exact p-value is pt(t_obs, df = n - 1) = r round(pt(t_obs, df = n - 1), 6) - make sure your range from the t-table includes this value!
Of course, R can easily do this in one step, including the calculation of the CI (do this as an exercise!):
8.2 Two-Sample t-tests
Consider the following data, which are the total numbers of pages for two kinds of books in my office. Are the numbers of pages different at the 5% level?
Set up the null and alternative hypothesis before looking at the code below!!!
From the historgrams, normal distributions are fine.
From the box plot, it looks like there might be a difference!
Doing the calculations is left as an exercise (be vary careful about brackets and rounding!). In R, this is about as easy as a one-sample t-test.
Alternative Data Representation
An alternative way to represent the data is as a data frame, which is much more common in practice.