Using Time to Measure Causal Effects

STAT 20: Introduction to Probability and Statistics

Agenda

  • Announcements
  • Concept Questions
  • Problem Set 20

Announcements

  • PS 19 and PS 20 both due Tuesday 4/30 at 9:00 AM
  • Final exam review sessions:
    • Summarization: 12pm-1pm Monday 4/29, Stanley 105
    • Causality: 3pm-4pm Monday 4/29, Stanley 105
    • Generalization: 3pm-4pm Wednesday 5/1, VLSB 2050
    • Probability: 4pm-5pm Wednesday 5/1, VLSB 2050
    • Prediction: 3pm-4pm Friday 5/3, Stanley 105
  • Final exam: 7pm-10pm, Thursday 5/9, room TBA.
  • Please fill out course evals!
  • Consider applying to join Fall 2024 Stat 20 course staff (deadline 4/26)

Concept Questions

Based on the plot, which of these analyses will give us a good estimate of the treatment effect?

  1. Pre/post comparison.
  2. Interrupted time series.
  3. Difference-in-differences.
  4. None of the above.
01:00

Based on the plot, which of these analyses will give us a good estimate of the treatment effect?

  1. Pre/post comparison.
  2. Interrupted time series.
  3. Difference-in-differences.
  4. None of the above.
01:00

Based on the plot, which of these analyses will give us a good estimate of the treatment effect?

  1. Pre/post comparison.
  2. Interrupted time series.
  3. Difference-in-differences.
  4. None of the above.
01:00

Based on the plot, which of these analyses will give us a good estimate of the treatment effect?

  1. Pre/post comparison.
  2. Interrupted time series.
  3. Difference-in-differences.
  4. None of the above.
01:00

Based on the plot, which of these analyses will give us a good estimate of the treatment effect?

  1. Pre/post comparison.
  2. Interrupted time series.
  3. Difference-in-differences.
  4. None of the above.
01:00

A statistician conducts a pre/post comparison and attempts to obtain a confidence interval for their treatment effect estimate using the bootstrap. Shown below is the original data (at left) and one of the bootstrap samples (at right).

Original Sample:

Subject Response Time_Period
Jimmy 1.0 Pre
Jimmy 1.5 Post
Sarita 4.0 Pre
Sarita 4.2 Post
Min 1.8 Pre
Min 2.3 Post

Bootstrap Sample:

Subject Response Time_Period
Jimmy 1.5 Post
Jimmy 1.5 Post
Sarita 4.0 Pre
Sarita 4.2 Post
Sarita 4.0 Pre
Min 2.3 Post
01:00

What is the problem with this way of using the bootstrap?

A. The bootstrap sample does not contain the right number of observations.

B. Some of the observations in the bootstrap sample are exact copies of each other.

C. Unique subjects in the bootstrap sample do not have one “pre” and one “post” observation each.

D. There is no problem, this is a valid use of the bootstrap.

library(tidyverse)
library(infer)
toy_example <- data.frame('Subject' = c(rep('Jimmy',2),
                                        rep('Sarita',2),
                                        rep('Min',2)),
                          'Response' = c(1.0,1.5,4.0,4.2,1.8,2.3),
                          'Time_Period' = rep(c('Pre','Post'),3))

Incorrect:

toy_example |>
  specify(response = Response,
          explanatory = Time_Period) |>
  generate(reps = 500, 
           type = 'bootstrap') |>
  calculate(stat = 'diff in means', 
            order = c('Post','Pre')) |>
  visualize()

Correct:

toy_example |>
  pivot_wider(names_from = Time_Period, 
              values_from = Response) |>
  mutate(diff = Post - Pre) |>
  specify(response = diff) |>
  generate(reps = 500, 
           type = 'bootstrap') |>
  calculate(stat = 'mean') |>
  visualize(bins=4) + xlim(-2.5,2.5)

Break

05:00

Problem Set

60:00