11  Getting the Wrong Model

Author
Affiliation

Dr. Devan Becker

Jam TBD

Published

2024-06-13

11.1 The Wrong Model

The Right Model?

Recall: All models are wrong, some are useful!

But how wrong can a model be while still being useful?

  • This is an extraordinarily challenging philosophical question.
  • We will touch on a very small part of it

The Wrong Predictors

So far, we’ve talked about a model of the form \(Y=X\underline\beta + \underline\epsilon\).

  • \(E(\hat{\underline\beta}) = E((X^TX)^{-1}X^TY) = (X^TX)^{-1}X^TX\underline\beta = \underline\beta\)

However, what if we are missing some predictors?

What if the true model is \(Y=X\underline\beta + X_2\underline\beta_2 + \underline\epsilon\)? \[\begin{align*} E(\hat{\underline\beta}) &= E((X^TX)^{-1}X^TY)\\ & = (X^TX)^{-1}X^T(X\underline\beta + X_2\underline\beta_2) \\ & = (X^TX)^{-1}X^TX\underline\beta + (X^TX)^{-1}X^TX_2\underline\beta_2 \\ &= \underline\beta+ (X^TX)^{-1}X^TX_2\underline\beta_2\\ &= \underline\beta + A\underline\beta_2 \end{align*}\]

What is \(A\)?

Recall that \(\hat{\underline\beta} = (X^TX)^{-1}X^TY\).

On the previous slide, we had the equation: \[ E(\hat{\underline\beta}) = \underline\beta+ (X^TX)^{-1}X^TX_2\underline\beta_2 \] Thoughts?

Bias due to wrong predictors

The bias of an estimator is: \[ \text{Bias}(\hat{\underline\beta}) = \underline\beta - E(\hat{\underline\beta}) \]

For the case where \(Y = X\underline\beta + X_2\underline\beta_2 +\underline\epsilon\), \[ \text{Bias}(\hat{\underline\beta}) = \underline\beta - (\underline\beta + A\underline\beta_2) = A\underline\beta_2 \]

Expected Mean Square

See text.

Uses the identity: For an \(n\times n\) matrix \(Q\) and \(n\times 1\) random vector \(Y\) with variance \(V(Y)=\Sigma\), \[ E(Y^TQY) = (E(Y))^TQE(Y) + trace(Q\Sigma) \]

This may be useful for a future assignment question (will notify if you need it), but for now I’m going to explore this via simulation in the Rmd.

Summary

  • Choosing the wrong set of predictors can affect the model!

11.2 What have we learned?

Multiple Linear Regression Concepts

  • If you add a predictor, the other coefficients change.
  • Variance is everything
    • \(MS_E\) = variance of residuals, \(MS_{Reg}\) = variance of the line!
    • \(SS_T = SS_{Reg} + SS_E\)
      • \(df_T = df_{Reg} + df_E\)
  • Always check assumptions
    • Residual plots (using the appropriate residuals)!
  • Try to test as few hypotheses as possible!
  • The hat matrix is magical.

Ordinary Least Squares Estimates

For the model \(y = \beta_0 + \beta_1x_{i1} + \beta_2x_{i2} + \epsilon_i\):

  • Find the OLS estimate by minimizing \(\sum_{i=1}^n\epsilon_i^2 = \sum(y_i - \beta_0 - \beta_1x_{i1} - \beta_2x_{i2})^2 = \underline\epsilon^T\underline\epsilon\)
    • \(\hat{\underline\beta} = (X^TX)^{-1}X^TY\)
    • These estimates do not require the normality assumption.
  • MLE gets the same estiamtes (assuming normality).

With normality assumptions,

  • \(V(\hat{\underline\beta}) = (X^TX)^{-1}\sigma^2\)
    • Confidence intervals and confidence regions

ANOVA: Variance tells us about slopes

\(H_0: SS_{Reg} = 0\) is equivalent to \(H_0: \beta_1 = \beta_2 = \dots = \beta_{p-1} = 0\).

  • A horizontal line has a variance of 0!
    • \(\beta_0 = \bar y\) does not have variance in the \(y\)-direction
    • More variance is a good thing, since this is the variance explained.

This extends to extra sum-of-squares due to adding predictons.

  • \(H_0: SS_1 - SS_2 = 0\; \Leftrightarrow\; H_0: \beta_q = \beta_{q+1} = \dots = \beta_{p-1} = 0\)