`01:00`

STAT 20: Introduction to Probability and Statistics

`01:00`

What will be the sign of the coefficient for `bill_length_mm`

?

```
Call:
lm(formula = bill_depth_mm ~ bill_length_mm, data = penguins)
Coefficients:
(Intercept) bill_length_mm
20.78665 -0.08233
```

`01:00`

What will be the sign of the coefficient for `bill_length_mm`

? How many coefficients will be in this linear model?

```
Call:
lm(formula = bill_depth_mm ~ bill_length_mm + species, data = penguins)
Coefficients:
(Intercept) bill_length_mm speciesChinstrap speciesGentoo
10.5653 0.2004 -1.9331 -5.1033
```

**Dummy Variable**

A variable that is 1 if an observation takes a particular level of a categorical variable and 0 otherwise. A categorical variable with \(k\) levels can be encoded using \(k - 1\) dummy variables.

`01:00`

```
Call:
lm(formula = bill_depth_mm ~ bill_length_mm + species, data = penguins)
Coefficients:
(Intercept) bill_length_mm speciesChinstrap speciesGentoo
10.5653 0.2004 -1.9331 -5.1033
```

Which is the correct interpretation of the coefficient in front of Gentoo?

Consider the following linear regression output where the variable `school`

is categorical and the variable `hours_studied`

is numerical.

Coefficients | Estimate |
---|---|

`(Intercept)` |
2.5 |

`hours_studied` |
.2 |

`schoolCal` |
1 |

`schoolStanford` |
-1 |

- Say I wanted to create a data frame from the original
`edu`

dataframe which contains the minimum, median, and IQR for`hours_studied`

among each school. In order to do this, I make use of`group_by()`

followed by`summarize()`

. I save this data frame into an object called`GPA_summary`

.

What are the dimensions of `GPA_summary`

?

`01:00`