Your progress on quizzes is being saved!

Tutorial 10

In tutorial 10, we extended our model of penguin flipper length by including bill length as an additional predictor and learnt how to centre, scale, and standardise variables in the linear model to change the interpretation of the individual \(b\) coefficients. The last tutorial of this module will guide you through model comparison using the F-test with R’s anova() function. Before you start this tutorial, make sure you are comfortable with the concepts discussed in lecture 10.

Warm-up quiz

As usual, here’s a little quiz to get you in the mood. Consider this linear model when answering the questions below.


Call:
lm(formula = live ~ love + laugh, data = yolo)

Residuals:
    Min      1Q  Median      3Q     Max 
-49.382 -12.559  -0.789  12.820  50.012 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)   48.689      1.578  30.850  < 2e-16 ***
love          10.202      1.212   8.419 1.76e-15 ***
laugh         -3.306      2.373  -1.393    0.165    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 19.14 on 291 degrees of freedom
Multiple R-squared:  0.201, Adjusted R-squared:  0.1955 
F-statistic:  36.6 on 2 and 291 DF,  p-value: 6.609e-15

QUESTION 1

How many predictors are there in this model?

2

Correct!That’s not right…

QUESTION 2

Assuming laugh is a binary categorical variable, what is the difference in live between the two groups?
(Give answer to 2 decimal places)

-3.31 | 3.31

Correct!That’s not right…

QUESTION 3

Which of the two groups has a higher expected value of live?

laugh = 0

QUESTION 4

Assuming love is a mean-centred continuous variable, what is the expected value of live for someone with an average level of love and 0 laugh?
(Give answer to 2 decimal places)

48.69

Correct!That’s not right…

QUESTION 5

What model does the F-test reported in the summary compare this model to?

The null (intercept-only) model

QUESTION 6

Given standardised B coefficients of 0.46 and −0.11 for love and laugh respectively, which variable has the largest effect on the outcome?

Love

Setting Up

As usual, all you need is this tutorial and RStudio. Remember that you can easily switch between windows with the Alt + ↹ Tab (Windows) and ⌘ Command + ⇥ Tab (Mac OS) shortcuts.

Task 1punk!

Create a new week_10 project and open it in RStudio. Then, create two new folders in the new week_10 folder: r_docs and data. Finally, open a new R Markdown file and save it in the r_docs folder. Get into the habit of creating new code chunk in the file for each of the tasks below.

Remember, you can add new code chunks by:

  1. Using the RStudio toolbar: Click Code > Insert Chunk
  2. Using a keyboard shortcut: the default is Ctrl + Alt + I (Windows) or ⌘ Command + ⌥ Option + I (MacOS), but you can change this under Tools > Modify Keyboard Shortcuts…
  3. Typing it out: ```{r}, press Enter, then ``` again.
  4. Copy and pasting a code chunk you already have (but be careful of duplicated chunk names!)

 

Task 2punk!

In this tutorial, just like in the previous two, we’ll continue using the penguins data set from the palmerpenguins package. You will also need the packages tidyverse, broom, and QuantPsyc (install it if you don’t have it yet; watch out for spelling and upper/lower case letters!). Put the code that loads all of these in the setup chunk of your Rmd file and run it.

Data

Task 3punk!

Just like the last couple of times, save the penguins data set in your environment as peng_data to avoid having to type out palmerpenguins::penguins every time you want to use it.

Task 4punk!

It’s always good to refresh your memory about the dataset by looking at a rough summary.

Adding a categorical predictor

Let’s start by rebuilding one of the models we fitted last time.

Task 5Prog-rocK

Build a linear model predicting flipper length by body mass in kilogrammes and bill length in millimetres, with both predictors mean-centred.

Task 6punk!

Create a new model m_mass_bill_sex that adds the sex variable in addition to the two predictors in the previous model.

Evaluating results

Task 7jazz...

Look at the summary of this model and answer the following questions.

QUESTION 7

Which category of the sex variable is represented by 0?

Female

QUESTION 8

What is the difference between the mean flipper length of a male and female penguin?
(Give answer in mm to 2 decimal places, such as 0.42)

-4.71 | 4.71

Correct!That’s not right…

QUESTION 9

Which sex of penguin has, on average, longer flippers?

Female

QUESTION 10

Is the difference statistically significant at α = .001?

Yes

QUESTION 11

What is the expected flipper length of a male penguin of average size with a bill of average length?
(Give answer in mm to 2 decimal places, such as 200.15)

198.51

Correct!That’s not right…

QUESTION 12

What percentage of the total variance in the outcome is accounted for by the combined effect of the predictors in the data?
(Give answer as % to 1 decimal place)

81.4

Correct!That’s not right…

QUESTION 13

Does the model include any unhelpful predictors?

No

Task 8Prog-rocK

What is the unique proportion of variance in the flipper length variable that’s attributable to sex? In other words, how much variance in the outcome is accounted for by sex?

Hint

You’ll need the difference between R2 from two models.

Model comparison

Now that we know all about the effect of sex on flipper length in penguins (this is going to win you a pub quiz one day!), let’s look at a formal comparison between a model that includes the variable and one that does not.

Task 9

Let’s formally compare m_massKg_bill_cntr and m_mass_bill_sex.

Task 9.1punk!

Enter the objects containing the two linear models as arguments to the anova() function.

 

Well, that didn’t work, did it?

The error message we get from R for once actually tells us what the problem is in plain English:

Error in anova.lmlist(object, ...) : models were not all fitted to the same size of dataset

So the problem is that the two models we are trying to compare are based on datasets of different sizes.

Task 9.2Prog-rocK

Pause to think why this might be the case…

Know your models

While we haven’t explicitly focused on data exploration in the past few tutorials, we stressed the importance of knowing your data earlier in the module. We also asked you to familiarise yourself with your dataset which entails noticing things like unusual, impossible, or missing values in variables.

A familiarity with your data is really important for understanding situations such as the one that we just encountered. You should always be able to answer questions like:

QUESTION 14

What is the sample size used to fit the m_massKg_bill_cntr model?

342

Correct!That’s not right…

QUESTION 15

What is the sample size used to fit the m_mass_bill_sex model?

333

Correct!That’s not right…

QUESTION 16

How many missing values are there in the sex variable?

11

Correct!That’s not right…

QUESTION 17

How many rows have missing data in the other variables entered in the m_mass_bill_sex model but not in the sex variable?

0

Correct!That’s not right…

Hint

There are several ways of finding out how many observations were used in a particular model. A quick one is to check the number of residual df (second number of df for the F-test). This number is always equal to N − p − 1, where p is the number of predictors in the model.

To check the number of missing values in a variable, you can just look at the summary() of the data, use the is.na() and sum() functions, or some other code. Check out the R script to practical 3 for inspiration.

So now, to compare our two models, we need to refit m_massKg_bill_cntr using the same subset of peng_data that was used to fit m_mass_bill_sex.

Task 9.3Prog-rocK

Create an object in the environment called peng_subset that will allow you to refit m_massKg_bill_cntr such that it is comparable with the other model.

Hint

Think about which variable (or NAs in which variable) caused the discrepancy in sample size and how you can get around the issue.

Task 9.4punk!

Create a model called m_massKg_bill_cntr_compare that is the same as m_massKg_bill_cntr but, this time round, fitted to peng_subset.

Task 9.5punk!

Have a look at coefficients of this new model and compare them to the ones from the original model

As you can see, the coefficients are ever so slightly different. This is not that surprising because the data these two versions of the model use are themselves different. What is reassuring is that the discrepancies are very small.

 

Now that the first model is fitted to the same data as the second one, we can finally compare them.

Task 9.6punk!

Compare the m_massKg_bill_cntr_compare and m_mass_bill_sex models using the anova() function.

OK, that should have worked now. Let’s break down the output of the anova() function.

First of all and least importantly, we have the heading Analysis of Variance Table. This is slightly confusing terminology as the term analysis of variance (or ANOVA) is oftern reserved for a different context. Let’s just agree to call this the F-test and never mind the heading (or the name of the function, for that matter).

Next, we have a re-statement of the models we’re comparing along with labels Model 1 and Model 2. These labels correspond to the rows of the table below:

  Res.Df   RSS Df Sum of Sq      F    Pr(>F)    
1    330 13598                                  
2    329 12108  1      1490 40.486 6.635e-10 ***

The first row of this table gives us information about Model 1 (in our case m_massKg_bill_cntr_compare), while the second row contains info about Model 2 (m_mass_bill_sex), plus the results of the F-test itself. Remember from lecture 10, that the F-test compares the amount of variance in the outcome variable explained by a model to the variance not explained by the model. This table contains the value of the F-statistic (40.49) and its associated p-value (6.64 × 10−10, which is <.001). On top of that, it also contains all the information to actually calculate F and p. Let’s go over this calculation together.

In the lecture, you learnt that the formula for F is

\[F = \frac{\frac{SS_M}{df_M}}{\frac{SS_R}{df_R}}\] where the model sum of squares (\(SS_M\), Sum of Sq in the table above) is just the total SS minus the residual SS (the latter labelled RSS in the table above). You also learnt that the total sum of squares for a model is just the residual SS of the model that we are comparing it to. So, in our case, \(SS_T\) of our model m_mass_bill_sex is the same as \(SS_R\) of m_massKg_bill_cntr_compare. Reading the table, it all checks out:

     RSS Sum of Sq
1  13598
2  12108      1490

Sum of Sq(row 2) = RSS(1) − RSS(2).

The difference between our two models is just one extra coefficient (for sex) and so the model degrees of freedom is 1. The residual degrees of freedom, as mentioned earlier is \(N - p - 1\), where \(p\) is the total number of predictors in the model: 333 − 3 − 1 = 329.

Now that we have all the numbers we need, we can just plug them into the formula:

\[F = \frac{\frac{SS_M}{df_M}}{\frac{SS_R}{df_R}} = \frac{\frac{1,490}{1}}{\frac{12,108}{329}} \approx\frac{1,490}{36.8} \approx40.4\]

Nice!

So what does the F-test tell us? As explained in the lecture, the F-test tests the null hypothesis that the new model is NOT a significant improvement on the old model. If the probability of getting the F-value we got from the test, or an even larger one, if the null is true is small enough – smaller than our chosen \(\alpha\) level – we reject the null hypothesis.

In our case, the p-value is far less than any conventional \(\alpha\) so we can say that:

Sex was significantly related to flipper length, b = −4.71, 95% CI [−6.17, −3.26], t(329) = −6.36, p < .001, and accounted for additional 2.6% of the variance in the outcome. The inclusion of this predictor in the model represented a significant improvement in model fit over the comparison model, F(1, 329) = 40.49, p < .001.

Task 10Prog-rocK

For your last task, perform an F-test comparing m_massKg_bill_cntr_compare to the null model.

 

Well done finishing your last Analysing Data tutorial!

 

EXTRA: Mismatch

In case you’re wondering why the the F-value in the 2nd row does not match the F-statistic at the bottom of summary() output:

summary(m_massKg_bill_cntr_compare)

Call:
lm(formula = flipper_length_mm ~ mass_kg_cntr + bill_length_cntr, 
    data = peng_subset)

Residuals:
     Min       1Q   Median       3Q      Max 
-20.9869  -4.5868   0.4285   4.8784  14.0977 

Coefficients:
                  Estimate Std. Error t value Pr(>|t|)    
(Intercept)      200.85939    0.35180 570.940  < 2e-16 ***
mass_kg_cntr      13.01733    0.54163  24.034  < 2e-16 ***
bill_length_cntr   0.54404    0.07975   6.822 4.31e-11 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 6.419 on 330 degrees of freedom
Multiple R-squared:  0.7915,    Adjusted R-squared:  0.7902 
F-statistic: 626.3 on 2 and 330 DF,  p-value: < 2.2e-16

…the answer has to do with the fact that there are several ways of cutting up variance. The F-test in summary() uses something called Type I Sums of Squares, while the F-test from anova() uses Type III Sums of Squares. We briefly talked about the difference between these two in the 7th episode of #statsChats.