Multiple Linear Regression

STAT 20: Introduction to Probability and Statistics

Agenda

  • Announcements
  • Multiple Linear Regression Refresher
  • Quiz Review (this week’s notes)
  • Break
  • Lab 2.2 (extended)

Announcements

  • Quiz 1 is Monday, in-class and covers all lectures from the beginning to the semester until today.
  • Lab 2.2, Problem Set 4 and Problem Set 5 are due Tuesday 9am
    • Make sure you follow Lab Submission Guidelines on Ed
  • RQ: Introducing Probability due on Monday/Tuesday at 11:59pm; Probability unit begins next week
  • Extra Practice for Multiple Linear Regression added to the resources tab on the course home page.

Multiple Linear Regression Refresher

  • Head to pollev.com for a set of rapid-fire questions on last night’s notes.

Quiz Review

  • Head to pollev.com for a set of quiz-level questions pertaining to Summarizing Numerical Associations and Multiple Linear Regression.

Question 1

m1 <- lm(bill_depth_mm ~ bill_length_mm, data = penguins)
m2 <- lm(bill_depth_mm ~ bill_length_mm + body_mass_g + species, 
         data = penguins)

How many more coefficients does the second model have than the first?

Questions 2-4

Consider the following multiple linear regression model, which will be the subject of the next three review questions.

Question 2

01:00


m2

Call:
lm(formula = bill_depth_mm ~ bill_length_mm + body_mass_g + species, 
    data = penguins)

Coefficients:
     (Intercept)    bill_length_mm       body_mass_g  speciesChinstrap  
        10.33083           0.09484           0.00117          -0.90748  
   speciesGentoo  
        -5.80117  

Which is the correct interpretation of the coefficient in front of bill length? Select all that apply.

Question 3

01:00


m2

Call:
lm(formula = bill_depth_mm ~ bill_length_mm + body_mass_g + species, 
    data = penguins)

Coefficients:
     (Intercept)    bill_length_mm       body_mass_g  speciesChinstrap  
        10.33083           0.09484           0.00117          -0.90748  
   speciesGentoo  
        -5.80117  

Which is the correct interpretation of the coefficient in front of Gentoo?

Question 4

01:00


m2

Call:
lm(formula = bill_depth_mm ~ bill_length_mm + body_mass_g + species, 
    data = penguins)

Coefficients:
     (Intercept)    bill_length_mm       body_mass_g  speciesChinstrap  
        10.33083           0.09484           0.00117          -0.90748  
   speciesGentoo  
        -5.80117  

How would this linear model best be visualized?

Question 5

Consider the following linear regression output where the variable school is categorical and the variable hours_studied is numerical.

Coefficients Estimate
(Intercept) 2.5
hours_studied .2
schoolCal 1
schoolStanford -1

Question 5 (cont.)

  • Say I wanted to create a data frame from the original edu dataframe which contains the minimum, median, and IQR for hours_studied among each school. In order to do this, I make use of group_by() followed by summarize(). I save this data frame into an object called GPA_summary.

What are the dimensions of GPA_summary?

01:00

Break

05:00

Lab 2.2 (extended)

40:00

End of Lecture