10  Chi-Square

Author
Affiliation

Dr. Devan Becker

Wilfrid Laurier University

Published

2024-11-18

10.1 Expected Counts

Attachment Type Spam Not Spam
None 71 208
One Image 22 53
Multi Image 32 27
Other 30 93
Total 155 381

The following code brings this into R:

Under the null, P(Spam | Attachments) = P(Spam), so we can get the expected counts as follows. There’s a lot of trickery happening with the following code, it is not something you’re expected to understand

We can calculate the chi-square statistic as follows:

And of course R has a built in function:

Unlike the tests for proportions in which R has different continuity correction and doesn’t use “plus four”, R does exactly what we do for chisq!

We can display the results as a bar chart, where red is the observed and blue is the expected counts:

10.2 Confidence intervals for a Chi-Square Test

The null is that the row variable and the column variable are independent; there’s no mean value that we’re calculating! Confidence intervals do not apply.