Multiple Linear Regression

STAT 20: Introduction to Probability and Statistics

Agenda

Announcements
Multiple Linear Regression Refresher
Quiz Review (this week’s notes)
Break
Lab 2.2 (extended)

Announcements

Quiz 1 is Monday, in-class and covers all lectures from the beginning to the semester until today.

Lab 2.2, Problem Set 4 and Problem Set 5 are due Tuesday 9am
- Make sure you follow Lab Submission Guidelines on Ed

RQ: Introducing Probability due on Monday/Tuesday at 11:59pm; Probability unit begins next week

Extra Practice for Multiple Linear Regression added to the resources tab on the course home page.

Multiple Linear Regression Refresher

Head to pollev.com for a set of rapid-fire questions on last night’s notes.

Quiz Review

Head to pollev.com for a set of quiz-level questions pertaining to Summarizing Numerical Associations and Multiple Linear Regression.

Question 1

m1 <- lm(bill_depth_mm ~ bill_length_mm, data = penguins)

m2 <- lm(bill_depth_mm ~ bill_length_mm + body_mass_g + species, 
         data = penguins)

How many more coefficients does the second model have than the first?

Questions 2-4

Consider the following multiple linear regression model, which will be the subject of the next three review questions.

Question 2

01:00

m2


Call:
lm(formula = bill_depth_mm ~ bill_length_mm + body_mass_g + species, 
    data = penguins)

Coefficients:
     (Intercept)    bill_length_mm       body_mass_g  speciesChinstrap  
        10.33083           0.09484           0.00117          -0.90748  
   speciesGentoo  
        -5.80117

Which is the correct interpretation of the coefficient in front of bill length? Select all that apply.

Question 3

01:00

m2


Call:
lm(formula = bill_depth_mm ~ bill_length_mm + body_mass_g + species, 
    data = penguins)

Coefficients:
     (Intercept)    bill_length_mm       body_mass_g  speciesChinstrap  
        10.33083           0.09484           0.00117          -0.90748  
   speciesGentoo  
        -5.80117

Which is the correct interpretation of the coefficient in front of Gentoo?

Question 4

01:00

m2


Call:
lm(formula = bill_depth_mm ~ bill_length_mm + body_mass_g + species, 
    data = penguins)

Coefficients:
     (Intercept)    bill_length_mm       body_mass_g  speciesChinstrap  
        10.33083           0.09484           0.00117          -0.90748  
   speciesGentoo  
        -5.80117

How would this linear model best be visualized?

Question 5

Consider the following linear regression output where the variable school is categorical and the variable hours_studied is numerical.

Coefficients	Estimate
`(Intercept)`	2.5
`hours_studied`	.2
`schoolCal`	1
`schoolStanford`	-1

Question 5 (cont.)

Say I wanted to create a data frame from the original edu dataframe which contains the minimum, median, and IQR for hours_studied among each school. In order to do this, I make use of group_by() followed by summarize(). I save this data frame into an object called GPA_summary.

What are the dimensions of GPA_summary?

01:00

Break

05:00

Lab 2.2 (extended)

40:00

Multiple Linear Regression

Agenda

Announcements

Multiple Linear Regression Refresher

Quiz Review

Question 1

Questions 2-4

Question 2

Question 3

Question 4

Question 5

Question 5 (cont.)

Break

Lab 2.2 (extended)

End of Lecture