11 Getting the Wrong Model
11.1 The Wrong Model
The Right Model?
Recall: All models are wrong, some are useful!
But how wrong can a model be while still being useful?
- This is an extraordinarily challenging philosophical question.
- We will touch on a very small part of it
The Wrong Predictors
So far, we’ve talked about a model of the form \(Y=X\underline\beta + \underline\epsilon\).
- \(E(\hat{\underline\beta}) = E((X^TX)^{-1}X^TY) = (X^TX)^{-1}X^TX\underline\beta = \underline\beta\)
However, what if we are missing some predictors?
What if the true model is \(Y=X\underline\beta + X_2\underline\beta_2 + \underline\epsilon\)? \[\begin{align*} E(\hat{\underline\beta}) &= E((X^TX)^{-1}X^TY)\\ & = (X^TX)^{-1}X^T(X\underline\beta + X_2\underline\beta_2) \\ & = (X^TX)^{-1}X^TX\underline\beta + (X^TX)^{-1}X^TX_2\underline\beta_2 \\ &= \underline\beta+ (X^TX)^{-1}X^TX_2\underline\beta_2\\ &= \underline\beta + A\underline\beta_2 \end{align*}\]
What is \(A\)?
Recall that \(\hat{\underline\beta} = (X^TX)^{-1}X^TY\).
On the previous slide, we had the equation: \[ E(\hat{\underline\beta}) = \underline\beta+ (X^TX)^{-1}X^TX_2\underline\beta_2 \] Thoughts?
Bias due to wrong predictors
The bias of an estimator is: \[ \text{Bias}(\hat{\underline\beta}) = \underline\beta - E(\hat{\underline\beta}) \]
For the case where \(Y = X\underline\beta + X_2\underline\beta_2 +\underline\epsilon\), \[ \text{Bias}(\hat{\underline\beta}) = \underline\beta - (\underline\beta + A\underline\beta_2) = A\underline\beta_2 \]
Expected Mean Square
See text.
Uses the identity: For an \(n\times n\) matrix \(Q\) and \(n\times 1\) random vector \(Y\) with variance \(V(Y)=\Sigma\), \[ E(Y^TQY) = (E(Y))^TQE(Y) + trace(Q\Sigma) \]
This may be useful for a future assignment question (will notify if you need it), but for now I’m going to explore this via simulation in the Rmd.
Summary
- Choosing the wrong set of predictors can affect the model!
11.2 What have we learned?
Multiple Linear Regression Concepts
- If you add a predictor, the other coefficients change.
- Variance is everything
- \(MS_E\) = variance of residuals, \(MS_{Reg}\) = variance of the line!
- \(SS_T = SS_{Reg} + SS_E\)
- \(df_T = df_{Reg} + df_E\)
- Always check assumptions
- Residual plots (using the appropriate residuals)!
- Try to test as few hypotheses as possible!
- The hat matrix is magical.
Ordinary Least Squares Estimates
For the model \(y = \beta_0 + \beta_1x_{i1} + \beta_2x_{i2} + \epsilon_i\):
- Find the OLS estimate by minimizing \(\sum_{i=1}^n\epsilon_i^2 = \sum(y_i - \beta_0 - \beta_1x_{i1} - \beta_2x_{i2})^2 = \underline\epsilon^T\underline\epsilon\)
- \(\hat{\underline\beta} = (X^TX)^{-1}X^TY\)
- These estimates do not require the normality assumption.
- MLE gets the same estiamtes (assuming normality).
With normality assumptions,
- \(V(\hat{\underline\beta}) = (X^TX)^{-1}\sigma^2\)
- Confidence intervals and confidence regions
ANOVA: Variance tells us about slopes
\(H_0: SS_{Reg} = 0\) is equivalent to \(H_0: \beta_1 = \beta_2 = \dots = \beta_{p-1} = 0\).
- A horizontal line has a variance of 0!
- \(\beta_0 = \bar y\) does not have variance in the \(y\)-direction
- More variance is a good thing, since this is the variance explained.
This extends to extra sum-of-squares due to adding predictons.
- \(H_0: SS_1 - SS_2 = 0\; \Leftrightarrow\; H_0: \beta_q = \beta_{q+1} = \dots = \beta_{p-1} = 0\)