STAT 20: Introduction to Probability and Statistics

- Multiple Linear Regression Refresher
- Quiz Review (this week’s notes)
- Lab 2.2 (extended)

- Quiz 1 is Monday, in-class and covers all lectures from the beginning to the semester until today.

- Lab 2.2, Problem Set 4 and Problem Set 5 are due Tuesday 9am
- Make sure you follow Lab Submission Guidelines on Ed

- RQ: Introducing Probability due on Monday/Tuesday at 11:59pm; Probability unit begins next week

*Extra Practice*for Multiple Linear Regression added to the resources tab on the course home page.

Consider the following multiple linear regression model, which will be the subject of the next three review questions.

```
Call:
lm(formula = bill_depth_mm ~ bill_length_mm + body_mass_g + species,
data = penguins)
Coefficients:
(Intercept) bill_length_mm body_mass_g speciesChinstrap
10.33083 0.09484 0.00117 -0.90748
speciesGentoo
-5.80117
```

Which is the correct interpretation of the coefficient in front of **bill length**? *Select all that apply*.

Which is the correct interpretation of the coefficient in front of **Gentoo**?

How would this linear model best be visualized?

Consider the following linear regression output where the variable `school`

is categorical and the variable `hours_studied`

is numerical.

Coefficients | Estimate |
---|---|

`(Intercept)` |
2.5 |

`hours_studied` |
.2 |

`schoolCal` |
1 |

`schoolStanford` |
-1 |

- Say I wanted to create a data frame from the original
`edu`

dataframe which contains the minimum, median, and IQR for`hours_studied`

among each school. In order to do this, I make use of`group_by()`

followed by`summarize()`

. I save this data frame into an object called`GPA_summary`

.

What are the dimensions of `GPA_summary`

?

