# Multiple Linear Regression

STAT 20: Introduction to Probability and Statistics

## Agenda

• Announcements
• Multiple Linear Regression Refresher
• Quiz Review (this week’s notes)
• Break
• Lab 2.2 (extended)

## Announcements

• Quiz 1 is Monday, in-class and covers all lectures from the beginning to the semester until today.
• Lab 2.2, Problem Set 4 and Problem Set 5 are due Tuesday 9am
• Make sure you follow Lab Submission Guidelines on Ed
• RQ: Introducing Probability due on Monday/Tuesday at 11:59pm; Probability unit begins next week
• Extra Practice for Multiple Linear Regression added to the resources tab on the course home page.

# Multiple Linear Regression Refresher

• Head to pollev.com for a set of rapid-fire questions on last night’s notes.

# Quiz Review

• Head to pollev.com for a set of quiz-level questions pertaining to Summarizing Numerical Associations and Multiple Linear Regression.

## Question 1

m1 <- lm(bill_depth_mm ~ bill_length_mm, data = penguins)
m2 <- lm(bill_depth_mm ~ bill_length_mm + body_mass_g + species,
data = penguins)

How many more coefficients does the second model have than the first?

# Questions 2-4

Consider the following multiple linear regression model, which will be the subject of the next three review questions.

## Question 2

01:00

m2

Call:
lm(formula = bill_depth_mm ~ bill_length_mm + body_mass_g + species,
data = penguins)

Coefficients:
(Intercept)    bill_length_mm       body_mass_g  speciesChinstrap
10.33083           0.09484           0.00117          -0.90748
speciesGentoo
-5.80117  

Which is the correct interpretation of the coefficient in front of bill length? Select all that apply.

## Question 3

01:00

m2

Call:
lm(formula = bill_depth_mm ~ bill_length_mm + body_mass_g + species,
data = penguins)

Coefficients:
(Intercept)    bill_length_mm       body_mass_g  speciesChinstrap
10.33083           0.09484           0.00117          -0.90748
speciesGentoo
-5.80117  

Which is the correct interpretation of the coefficient in front of Gentoo?

## Question 4

01:00

m2

Call:
lm(formula = bill_depth_mm ~ bill_length_mm + body_mass_g + species,
data = penguins)

Coefficients:
(Intercept)    bill_length_mm       body_mass_g  speciesChinstrap
10.33083           0.09484           0.00117          -0.90748
speciesGentoo
-5.80117  

How would this linear model best be visualized?

## Question 5

Consider the following linear regression output where the variable school is categorical and the variable hours_studied is numerical.

Coefficients Estimate
(Intercept) 2.5
hours_studied .2
schoolCal 1
schoolStanford -1

## Question 5 (cont.)

• Say I wanted to create a data frame from the original edu dataframe which contains the minimum, median, and IQR for hours_studied among each school. In order to do this, I make use of group_by() followed by summarize(). I save this data frame into an object called GPA_summary.

What are the dimensions of GPA_summary?

01:00

# Break

05:00

# Lab 2.2 (extended)

40:00