21 Logistic Modelling

Author

Affiliation

Dr. Devan Becker

Jam: Love’s a Logistical Thing by PJ Parker

Published

2024-07-29

21.1 Preamble

Announcements

Haven’t had a chance to check A4 winner; winner will be contacted tomorrow!

21.2 Step 1: Concentrate on the Orange Juice

Getting Acquainted

For this analysis, let’s look at Orange Juice data from the ISLR2 package (I’ve resisted the temptation to use the Auto data).

Before we begin, define the goal:

Goal: When a customer comes in the store, can we predict which brand they buy?

Note: We’re not predicting whether they buy OJ, we’re predicting which brand they buy given that they bough OJ.

This is not the only possible question with these data:

Is the price difference enough to explain why people choose a brand? (If not, then the difference might be because people genuinely like one better than the other.)
How big of a discount is needed to sway a customer’s purchase?
Do different stores have customers with different purchasing habits?

Use the following code block to get acquainted with the data. We’ll make a lot of plots!

The response variable is a factor with levels “CH” for Citrus Hill and “MM” for Minute Maid. Let’s convert this to 1s and 0s so that we can do math on it. We’ll set MM to 1 and CH to 0 (this is completely arbitrary and will affect the interpretations but not the results).

Also, the “STORE” predictor is numeric, but the numbers aren’t actually meaningful. Store 4 is not twice the store as store 2, these are actually ordinal values. Let’s make it a factor so that we don’t accidentally fit a linear model with it.

The Predictors

What patterns in the predictors can you find? Take a good read of the help file (?OJ), there are some hints in there.

For example, there’s StoreID, which includes a store labelled 7, but there’s also an indicator for whether the store is 7, likely meaning that the authors thought there was something special about store 7. There are other hints there too!

Change the code below and see what you find! For now, ignore the WeekofPurchase column

WeekofPurchase

There are some predictors that require more advanced exploration techniques:

Also notice that the prices are all just a little bit below “nice” values. Prices are $1.99, $1.69, etc. because with $1.99 you read the “1” first and it feels like it’s just a little above $1 instead of being practically $2. I hate this tactic. Remember: if money is involved, someone is trying to manipulate you.

21.3 Step 2: The Response

Average value of y for each value of x

Use the following chunk to explore which predictors might be related to the response, including a possible interaction term.

Summary of the Results

The WeekofPurchase details the price changes, but we have those recorded.
- Would be useful if we wanted to see if a price drop affected sales, but that’s outside the scope of this course.
We might want either the PriceDiff or both Price*s.
Either StoreID or Store7?
LoyalCH is almost certainly an important predictor.
DiscMM and PriceMM shouldn’t be in the same model, but SpecialMM might be worth it.
Price*, Disc*, SalePrice* and Special* all encode similar information.
- PriceDiff is maybe more important than the group of these?
SpecialMM would likely encourage CH buyers to by MM instead. SpecialCH would likely encourage CH to buy MM.
- If both are on sale, perhaps there’s no effect?

Exploring one model

First, let’s start with a simple model.

PriceDiff accounts for the differences in sale price as well as whether they’re on special.
Store7 evaluates whether Store7 is different from the others.
LoyalCH is clearly something we want to include.

Thoughts?

A second option

Perhaps it matters whether the OJ is on sale. Maybe the actual store also matters. Let’s check both!

The actual sale price is used, not the listing price.
We have an interaction between SpecialMM and SpecialCH. If both are on sale, then perhaps it doesn’t affect the buying decision?

An extra special ESS

Note that StoreID gets split into dummy variables factor(StoreID)2, factor(StoreID)3, factor(StoreID)4, and factor(StoreID)7, with Store 1 being the reference category.

The existence of the predictor Store7 implies there’s something special about this store, and it’s the only significant result in m2.

If we use StoreID as the model, we have: \[ y_i = \beta_0 + \beta_1Store2_i + \beta_2Store3_i + \beta_3Store4_i + \beta_4Store7_i + ... + \epsilon_i \] Compare this to the model with Store7 as the only predictor \[ y_i = \beta_0 + \beta_4Store7_i + ... + \epsilon_i \] An ESS test for these two models is equivalent to testing $\beta_1 = \beta_2 = \beta_3 = 0$.

Since these two models are not significantly different, we can go with the simpler model (m2b).

Specials

In the output, it looked like neither Specials nor the interaction were significant. This is not enough to make a conclusion about all three variables!

To be honest about our p-values, we’re going to check all of the terms at once. We could check the interaction and then check the individual Special* terms, but it’s better to do it all in one go.

In fact, I specifically don’t want to check the interaction term. I’m quite certain that the interaction term should be there whenever the Special* terms are there. If both brands are on special, then the specials likely don’t affect the purchaser’s behaviour.

No significance means go with the simpler model!

Model 3: Just MM, or include CH?

Since we already saw that SpecialMM * SpecialCH wasn’t significant, this might add to the hypothesis that PriceDiff is everything.

Model 4 and beyond!

Try out some of the following:

An interaction between StoreID and one of the Discount predictors means that customers at different stores are more swayed by discounts.
What combination of Price*, Disc*, Special*, SalePrice*, and PctDisc* is best?
- Only consider models where both CH and MM are present, e.g. PriceMM and PriceCH.

In the code below, I investigate whether we can just use PriceDiff. I also demonstrate that the p-value is different depending on which other predictors are in the model - it might change the decision sometimes!

Note that these are not ESS tests, since neither model is nested in the other.

I intentially put an error in the following code. Fix it before you run the code.

21.4 A Final Model (for demonstration purposes)

Let’s just try this one.

This isn’t the best model, but I want to explore some topics.

Interpreting The Intercept

The intercept is 3.41 \[ p(x_i) = \text{expit}(3.41) = \frac{\exp(3.41)}{1 + \exp(3.41)} = 0.968 \] Theres a 96.8% chance that a customer buys MM when all predictors are 0.

Does this make sense?

Why was this expit() instead of exp(), like we use for the log odds ratio?

“When all predictors are 0” includes LoyalCH; the intercept refers to someone who is loyal to MM!

Interpreting the `PriceDiff` (MM - CH)

The slope for PriceDiff is -2.82.

Negative, which means P(buys MM) decreases as MM becomes more expensive relative to CH.
Odds Ratio is exp(-2.82) = 0.0596.
- People are 0.0596 times as likely to buy MM when it’s a dollar more expensive than CH, relative to when they’re the same price.
- For an “equal price” versus “50 cents less” comparison, customers are exp(-1.41) = 24% as likely to buy MM when CH is $0.5 cheaper.

Comparing Parameters

Which predictor is most important?

21.5 Regularization

LASSO

Recall:

glmnet will standardize the variables for you
glmnet uses LASSO by default.

We’re going to start with an overspecified model, and see what LASSO which predictors LASSO will select. Note that there are 28 predictors once we account for the interaction terms.

The two vertical lines are the minumum value of the MSE and the value that’s within 1 SE of the minimum.

lambda.1se does more regularizing, and the MSE is “close enough” relative to the variance.

LASSO with `lambda.1se`

Change lambda.1se to lambda.min to see the difference in selected features!

Interpreting LASSO parameters

The parameter for PriceDiff is -0.21. A one standard deviation change in PriceDiff leads to an Odds Ratio of exp(-0.21) = 0.81.

That’s an increase in price difference of $0.27 leading to an OR of 0.81.