01:00
Which of the following is an example of a classification task?
01:00
(Intercept) body_mass_g
-5.162541644 0.001239819
What is the predicted probability that probability that a penguin that weighs 4000 g is a female?
(As a bonus, try sketching this function on a scatterplot!)
01:00
(Intercept) body_mass_g bill_length_mm
-6.91208086 0.00101530 0.06112808
What are the predicted sexes of these two penguins?
01:00
glm()
!m2 <- glm(sex ~ body_mass_g + bill_length_mm,
data = train, family = "binomial")
p_hat <- predict(m2, test, type = "response")
test %>%
select(sex) %>%
mutate(p_hat = p_hat)
# A tibble: 70 × 2
sex p_hat
<fct> <dbl>
1 female 0.345
2 male 0.566
3 female 0.259
4 male 0.280
5 male 0.365
6 female 0.196
7 male 0.428
8 female 0.220
9 male 0.559
10 male 0.279
# … with 60 more rows
m2 <- glm(sex ~ body_mass_g + bill_length_mm,
data = train, family = "binomial")
p_hat <- predict(m2, test, type = "response")
test %>%
select(sex) %>%
mutate(p_hat = p_hat,
y_hat = ifelse(p_hat > .5, "male", "female"))
# A tibble: 70 × 3
sex p_hat y_hat
<fct> <dbl> <chr>
1 female 0.345 female
2 male 0.566 male
3 female 0.259 female
4 male 0.280 female
5 male 0.365 female
6 female 0.196 female
7 male 0.428 female
8 female 0.220 female
9 male 0.559 male
10 male 0.279 female
# … with 60 more rows
False Positives: Predicting a 1 that is in fact a 0
False Negatives: Predicting a 0 that is in fact a 1
Misclassification Rate:
\[ \frac{FP + FN}{total \, number \, of \, predictions} \]
test %>%
select(sex) %>%
mutate(p_hat = p_hat,
y_hat = ifelse(p_hat > .5, "male", "female"),
FP = sex == "female" & y_hat == "male",
FN = sex == "male" & y_hat == "female")
# A tibble: 70 × 5
sex p_hat y_hat FP FN
<fct> <dbl> <chr> <lgl> <lgl>
1 female 0.345 female FALSE FALSE
2 male 0.566 male FALSE FALSE
3 female 0.259 female FALSE FALSE
4 male 0.280 female FALSE TRUE
5 male 0.365 female FALSE TRUE
6 female 0.196 female FALSE FALSE
7 male 0.428 female FALSE TRUE
8 female 0.220 female FALSE FALSE
9 male 0.559 male FALSE FALSE
10 male 0.279 female FALSE TRUE
# … with 60 more rows