10 Chi-Square
10.1 Expected Counts
| Attachment Type | Spam | Not Spam |
|---|---|---|
| None | 71 | 208 |
| One Image | 22 | 53 |
| Multi Image | 32 | 27 |
| Other | 30 | 93 |
| Total | 155 | 381 |
The following code brings this into R:
Under the null, P(Spam | Attachments) = P(Spam), so we can get the expected counts as follows. There’s a lot of trickery happening with the following code, it is not something you’re expected to understand
We can calculate the chi-square statistic as follows:
And of course R has a built in function:
Unlike the tests for proportions in which R has different continuity correction and doesn’t use “plus four”, R does exactly what we do for chisq!
We can display the results as a bar chart, where red is the observed and blue is the expected counts:
10.2 Confidence intervals for a Chi-Square Test
The null is that the row variable and the column variable are independent; there’s no mean value that we’re calculating! Confidence intervals do not apply.