`10:00`

STAT 20: Introduction to Probability and Statistics

- Time to fill out the
*Mid-Semester Feedback*form - Concept Questions: Hypothesis Tests II
*Break*- PS: Hypothesis Testing with
`infer`

- Click here to fill out this form.

`10:00`

In order to demonstrate how to conduct a hypothesis test through simulation, we will be collecting data from this class using a poll.

You will have only 15 seconds to answer the following multiple choice question, so please get ready at `pollev.com`

…

The two shapes above have simple first names:

- Booba
- Kiki

Which of the two names belongs to the shape on the **left**?

`00:15`

- Assert a model for how the data was generated (the null hypothesis)
- Select a test statistic that bears on that null hypothesis (a mean, a proportion, a difference in means, a difference in proportions, etc).
- Approximate the sampling distribution of that statistic under the null hypothesis (aka the null distribution)
- Assess the degree of consistency between that distribution and the test statistic that was actually observed (either visually or by calculating a p-value)

- Let \(p_k\) be the probability that a person selects Kiki for the shape on the left.
- Let \(\hat{p}_k\) be the sample proportion of people that selected Kiki for the shape on the left.

What is a statement of the null hypothesis that corresponds to the notion the link between names and shapes is arbitrary?

`01:00`

\[\hat{p}_k = \frac{\textrm{Number who chose "Kiki"}}{\textrm{Total number of people}}\]

Note: you could also simply \(n_k\), the number of people who chose “Kiki”.

Our technique: simulate data from a world in which the null is true, then calculate the test statistic on the simulated data.

Which simulation method(s) align with the null hypothesis and our data collection process?

`01:00`

`infer`

```
library(tidyverse)
library(infer)
# update these based on the poll
n_k <- 40
n_b <- 20
shapes <- data.frame(name = c(rep("Kiki", n_k),
rep("Booba", n_b)))
shapes %>%
specify(response = name,
success = "Kiki") %>%
hypothesize(null = "point", p = .5) %>%
generate(reps = 1, type = "draw") %>%
calculate(stat = "prop")
```

```
null <- shapes %>%
specify(response = name,
success = "Kiki") %>%
hypothesize(null = "point", p = .5) %>%
generate(reps = 500, type = "draw") %>%
calculate(stat = "prop")
obs_p_hat <- shapes %>%
specify(response = name,
success = "Kiki") %>%
# hypothesize(null = "point", p = .5) %>%
# generate(reps = 500, type = "simulate") %>%
calculate(stat = "prop")
null %>%
visualise() +
shade_pvalue(obs_p_hat, direction = "both")
```

What is the proper interpretation of this p-value?

`01:00`

`25:00`