`01:00`

STAT 20: Introduction to Probability and Statistics

- Concept Questions
- Hypothesis Testing with
`infer`

- Practice Problems

Which of the following statements below represents claims that correspond to a null hypothesis (as opposed to an alternative hypothesis)?

Hint: try to write them using parameters (statements about means / proportions / etc)

A. King cheetahs on average run the same speed as standard spotted cheetahs.

B. For a particular student, the probability of correctly answering a 5-option multiple choice test is larger than 0.2 (i.e., better than guessing).

C. The mean length of African elephant tusks has changed over the last 100 years.

D. The risk of facial clefts is equal for babies born to mothers who take folic acid supplements compared with those from mothers who do not.

E. Mean birth weight of newborns is dependent on caffeine intake during pregnancy.

F. The probability of getting in a car accident is the same if using a cell phone than if not using a cell phone.

`01:00`

We want to understand whether blood thinners are helpful or harmful. We’ll consider both of these possibilities using a two-sided hypothesis test.

*Null*: Blood thinners do not have an overall survival effect, i.e., the survival proportions are the same in each group.

*Alternative*: Blood thinners have an impact on survival, either positive or negative, but not zero.

What is your guess at the p-value?

`01:00`

A pharmaceutical company developed a new treatment for eczema and performed a hypothesis test to see if it worked better than the company’s old treatment. The P-value for the test was \(10\%\). Which of the following statements are true?

A. The probability that the null hypothesis is false is \(10\%\).

B. The probability that the null hypothesis is false is \(90\%\).

C. The P-value of about \(10\%\) was computed assuming that the null hypothesis was true.

D. The new drug is significantly better than the old.

E. The alternative hypothesis is 10 times more likely the null.

`01:00`

`infer`

Question: Do beach lovers prefer the warm seasons more than mountain lovers?

What sort of visualization can we use to see the association between these two variables?

```
# A tibble: 619 × 2
beach_or_mtns warm_fav
<chr> <lgl>
1 At the beach TRUE
2 At the beach TRUE
3 In the mountains FALSE
4 At the beach TRUE
5 In the mountains FALSE
6 At the beach TRUE
7 In the mountains FALSE
8 At the beach FALSE
9 At the beach FALSE
10 At the beach TRUE
# … with 609 more rows
```

Question: Do beach lovers prefer the warm seasons more than mountain lovers?

We see the difference is non-zero, but could that just be a product of this particular small sample of data that we have?

```
Response: warm_fav (factor)
Explanatory: beach_or_mtns (factor)
# A tibble: 619 × 2
warm_fav beach_or_mtns
<fct> <fct>
1 TRUE At the beach
2 TRUE At the beach
3 FALSE In the mountains
4 TRUE At the beach
5 FALSE In the mountains
6 TRUE At the beach
7 FALSE In the mountains
8 FALSE At the beach
9 FALSE At the beach
10 TRUE At the beach
# … with 609 more rows
```

```
class_survey %>%
specify(response = warm_fav,
explanatory = beach_or_mtns,
success = "TRUE") %>%
hypothesize(null = "independence")
```

```
Response: warm_fav (factor)
Explanatory: beach_or_mtns (factor)
Null Hypothesis: independence
# A tibble: 619 × 2
warm_fav beach_or_mtns
<fct> <fct>
1 TRUE At the beach
2 TRUE At the beach
3 FALSE In the mountains
4 TRUE At the beach
5 FALSE In the mountains
6 TRUE At the beach
7 FALSE In the mountains
8 FALSE At the beach
9 FALSE At the beach
10 TRUE At the beach
# … with 609 more rows
```

```
class_survey %>%
specify(response = warm_fav,
explanatory = beach_or_mtns,
success = "TRUE") %>%
hypothesize(null = "independence") %>%
generate(reps = 1,
type = "permute")
```

```
Response: warm_fav (factor)
Explanatory: beach_or_mtns (factor)
Null Hypothesis: independence
# A tibble: 619 × 3
# Groups: replicate [1]
warm_fav beach_or_mtns replicate
<fct> <fct> <int>
1 TRUE At the beach 1
2 TRUE At the beach 1
3 TRUE In the mountains 1
4 TRUE At the beach 1
5 TRUE In the mountains 1
6 FALSE At the beach 1
7 FALSE In the mountains 1
8 FALSE At the beach 1
9 FALSE At the beach 1
10 TRUE At the beach 1
# … with 609 more rows
```

```
class_survey %>%
specify(response = warm_fav,
explanatory = beach_or_mtns,
success = "TRUE") %>%
hypothesize(null = "independence") %>%
generate(reps = 1,
type = "permute") # a second shuffle
```

```
Response: warm_fav (factor)
Explanatory: beach_or_mtns (factor)
Null Hypothesis: independence
# A tibble: 619 × 3
# Groups: replicate [1]
warm_fav beach_or_mtns replicate
<fct> <fct> <int>
1 FALSE At the beach 1
2 FALSE At the beach 1
3 TRUE In the mountains 1
4 TRUE At the beach 1
5 FALSE In the mountains 1
6 TRUE At the beach 1
7 FALSE In the mountains 1
8 TRUE At the beach 1
9 FALSE At the beach 1
10 TRUE At the beach 1
# … with 609 more rows
```

```
class_survey %>%
specify(response = warm_fav,
explanatory = beach_or_mtns,
success = "TRUE") %>%
hypothesize(null = "independence") %>%
generate(reps = 1,
type = "permute") # a third shuffle
```

```
Response: warm_fav (factor)
Explanatory: beach_or_mtns (factor)
Null Hypothesis: independence
# A tibble: 619 × 3
# Groups: replicate [1]
warm_fav beach_or_mtns replicate
<fct> <fct> <int>
1 TRUE At the beach 1
2 TRUE At the beach 1
3 TRUE In the mountains 1
4 TRUE At the beach 1
5 FALSE In the mountains 1
6 TRUE At the beach 1
7 TRUE In the mountains 1
8 TRUE At the beach 1
9 TRUE At the beach 1
10 TRUE At the beach 1
# … with 609 more rows
```

```
class_survey %>%
specify(response = warm_fav,
explanatory = beach_or_mtns,
success = "TRUE") %>%
hypothesize(null = "independence") %>%
generate(reps = 500,
type = "permute")
```

```
Response: warm_fav (factor)
Explanatory: beach_or_mtns (factor)
Null Hypothesis: independence
# A tibble: 309,500 × 3
# Groups: replicate [500]
warm_fav beach_or_mtns replicate
<fct> <fct> <int>
1 FALSE At the beach 1
2 TRUE At the beach 1
3 TRUE In the mountains 1
4 TRUE At the beach 1
5 FALSE In the mountains 1
6 FALSE At the beach 1
7 TRUE In the mountains 1
8 FALSE At the beach 1
9 TRUE At the beach 1
10 TRUE At the beach 1
# … with 309,490 more rows
```

```
class_survey %>%
specify(response = warm_fav,
explanatory = beach_or_mtns,
success = "TRUE") %>%
hypothesize(null = "independence") %>%
generate(reps = 500,
type = "permute") %>%
calculate(stat = "diff in props")
```

```
Response: warm_fav (factor)
Explanatory: beach_or_mtns (factor)
Null Hypothesis: independence
# A tibble: 500 × 2
replicate stat
<int> <dbl>
1 1 0.0608
2 2 -0.0316
3 3 0.00394
4 4 -0.0458
5 5 -0.0103
6 6 -0.0743
7 7 -0.0245
8 8 -0.00317
9 9 0.0182
10 10 0.0253
# … with 490 more rows
```

```
class_survey %>%
specify(response = warm_fav,
explanatory = beach_or_mtns,
success = "TRUE") %>%
hypothesize(null = "independence") %>%
generate(reps = 500,
type = "permute") %>%
calculate(stat = "diff in props") %>%
get_p_value(obs_stat = obs_stat,
direction = "both")
```

```
# A tibble: 1 × 1
p_value
<dbl>
1 0
```

We want to

Using the `class_survey`

data, create a plot of the distribution of thoughts on climate change separated out by whether or not a student is an econ major. Also compute a statistic that