21 Logistic Modelling
21.1 Preamble
Announcements
- Haven’t had a chance to check A4 winner; winner will be contacted tomorrow!
21.2 Step 1: Concentrate on the Orange Juice
Getting Acquainted
For this analysis, let’s look at Orange Juice data from the ISLR2 package (I’ve resisted the temptation to use the Auto data).
Before we begin, define the goal:
Goal: When a customer comes in the store, can we predict which brand they buy?
Note: We’re not predicting whether they buy OJ, we’re predicting which brand they buy given that they bough OJ.
This is not the only possible question with these data:
- Is the price difference enough to explain why people choose a brand? (If not, then the difference might be because people genuinely like one better than the other.)
- How big of a discount is needed to sway a customer’s purchase?
- Do different stores have customers with different purchasing habits?
Use the following code block to get acquainted with the data. We’ll make a lot of plots!
The response variable is a factor with levels “CH” for Citrus Hill and “MM” for Minute Maid. Let’s convert this to 1s and 0s so that we can do math on it. We’ll set MM to 1 and CH to 0 (this is completely arbitrary and will affect the interpretations but not the results).
Also, the “STORE” predictor is numeric, but the numbers aren’t actually meaningful. Store 4 is not twice the store as store 2, these are actually ordinal values. Let’s make it a factor so that we don’t accidentally fit a linear model with it.
The Predictors
What patterns in the predictors can you find? Take a good read of the help file (?OJ), there are some hints in there.
For example, there’s StoreID, which includes a store labelled 7, but there’s also an indicator for whether the store is 7, likely meaning that the authors thought there was something special about store 7. There are other hints there too!
Change the code below and see what you find! For now, ignore the WeekofPurchase column
WeekofPurchase
There are some predictors that require more advanced exploration techniques:
Also notice that the prices are all just a little bit below “nice” values. Prices are $1.99, $1.69, etc. because with $1.99 you read the “1” first and it feels like it’s just a little above $1 instead of being practically $2. I hate this tactic. Remember: if money is involved, someone is trying to manipulate you.
21.3 Step 2: The Response
Average value of y for each value of x
Use the following chunk to explore which predictors might be related to the response, including a possible interaction term.
Summary of the Results
- The
WeekofPurchasedetails the price changes, but we have those recorded.- Would be useful if we wanted to see if a price drop affected sales, but that’s outside the scope of this course.
- We might want either the
PriceDiffor bothPrice*s. - Either
StoreIDorStore7? LoyalCHis almost certainly an important predictor.DiscMMandPriceMMshouldn’t be in the same model, butSpecialMMmight be worth it.Price*,Disc*,SalePrice*andSpecial*all encode similar information.PriceDiffis maybe more important than the group of these?
SpecialMMwould likely encourage CH buyers to by MM instead.SpecialCHwould likely encourage CH to buy MM.- If both are on sale, perhaps there’s no effect?
Exploring one model
First, let’s start with a simple model.
PriceDiffaccounts for the differences in sale price as well as whether they’re on special.Store7evaluates whether Store7 is different from the others.LoyalCHis clearly something we want to include.
Thoughts?
A second option
Perhaps it matters whether the OJ is on sale. Maybe the actual store also matters. Let’s check both!
- The actual sale price is used, not the listing price.
- We have an interaction between
SpecialMMandSpecialCH. If both are on sale, then perhaps it doesn’t affect the buying decision?
An extra special ESS
Note that StoreID gets split into dummy variables factor(StoreID)2, factor(StoreID)3, factor(StoreID)4, and factor(StoreID)7, with Store 1 being the reference category.
The existence of the predictor Store7 implies there’s something special about this store, and it’s the only significant result in m2.
If we use StoreID as the model, we have: \[
y_i = \beta_0 + \beta_1Store2_i + \beta_2Store3_i + \beta_3Store4_i + \beta_4Store7_i + ... + \epsilon_i
\] Compare this to the model with Store7 as the only predictor \[
y_i = \beta_0 + \beta_4Store7_i + ... + \epsilon_i
\] An ESS test for these two models is equivalent to testing \(\beta_1 = \beta_2 = \beta_3 = 0\).
Since these two models are not significantly different, we can go with the simpler model (m2b).
Specials
In the output, it looked like neither Specials nor the interaction were significant. This is not enough to make a conclusion about all three variables!
To be honest about our p-values, we’re going to check all of the terms at once. We could check the interaction and then check the individual Special* terms, but it’s better to do it all in one go.
In fact, I specifically don’t want to check the interaction term. I’m quite certain that the interaction term should be there whenever the Special* terms are there. If both brands are on special, then the specials likely don’t affect the purchaser’s behaviour.
No significance means go with the simpler model!
Model 3: Just MM, or include CH?
Since we already saw that SpecialMM * SpecialCH wasn’t significant, this might add to the hypothesis that PriceDiff is everything.
Model 4 and beyond!
Try out some of the following:
- An interaction between
StoreIDand one of the Discount predictors means that customers at different stores are more swayed by discounts. - What combination of
Price*,Disc*,Special*,SalePrice*, andPctDisc*is best?- Only consider models where both CH and MM are present, e.g.
PriceMMandPriceCH.
- Only consider models where both CH and MM are present, e.g.
In the code below, I investigate whether we can just use PriceDiff. I also demonstrate that the p-value is different depending on which other predictors are in the model - it might change the decision sometimes!
Note that these are not ESS tests, since neither model is nested in the other.
I intentially put an error in the following code. Fix it before you run the code.
21.4 A Final Model (for demonstration purposes)
Let’s just try this one.
This isn’t the best model, but I want to explore some topics.
Interpreting The Intercept
The intercept is 3.41 \[ p(x_i) = \text{expit}(3.41) = \frac{\exp(3.41)}{1 + \exp(3.41)} = 0.968 \] Theres a 96.8% chance that a customer buys MM when all predictors are 0.
Does this make sense?
Why was this expit() instead of exp(), like we use for the log odds ratio?
- “When all predictors are 0” includes
LoyalCH; the intercept refers to someone who is loyal to MM!
Interpreting the PriceDiff (MM - CH)
The slope for PriceDiff is -2.82.
- Negative, which means P(buys MM) decreases as MM becomes more expensive relative to CH.
- Odds Ratio is exp(-2.82) = 0.0596.
- People are 0.0596 times as likely to buy MM when it’s a dollar more expensive than CH, relative to when they’re the same price.
- For an “equal price” versus “50 cents less” comparison, customers are exp(-1.41) = 24% as likely to buy MM when CH is $0.5 cheaper.
Comparing Parameters
Which predictor is most important?
21.5 Regularization
LASSO
Recall:
glmnetwill standardize the variables for youglmnetuses LASSO by default.
We’re going to start with an overspecified model, and see what LASSO which predictors LASSO will select. Note that there are 28 predictors once we account for the interaction terms.
The two vertical lines are the minumum value of the MSE and the value that’s within 1 SE of the minimum.
lambda.1sedoes more regularizing, and the MSE is “close enough” relative to the variance.
LASSO with lambda.1se
Change lambda.1se to lambda.min to see the difference in selected features!
Interpreting LASSO parameters
The parameter for PriceDiff is -0.21. A one standard deviation change in PriceDiff leads to an Odds Ratio of exp(-0.21) = 0.81.
That’s an increase in price difference of $0.27 leading to an OR of 0.81.