Using Time to Measure Causal Effects

STAT 20: Introduction to Probability and Statistics

Agenda

Announcements
Concept Questions
Problem Set 20

Announcements

PS 19 and PS 20 both due Tuesday 4/30 at 9:00 AM
Final exam review sessions:
- Summarization: 12pm-1pm Monday 4/29, Stanley 105
- Causality: 3pm-4pm Monday 4/29, Stanley 105
- Generalization: 3pm-4pm Wednesday 5/1, VLSB 2050
- Probability: 4pm-5pm Wednesday 5/1, VLSB 2050
- Prediction: 3pm-4pm Friday 5/3, Stanley 105
Final exam: 7pm-10pm, Thursday 5/9, room TBA.
Please fill out course evals!
Consider applying to join Fall 2024 Stat 20 course staff (deadline 4/26)

Concept Questions

Based on the plot, which of these analyses will give us a good estimate of the treatment effect?

Pre/post comparison.
Interrupted time series.
Difference-in-differences.
None of the above.

01:00

This is the first of a series of five questions that each ask students to choose an analysis method for a study with repeated measures. Each considers a dataset with repeated measures for 8 unique subjects. The response data is plotted with a line for each subject, color-coded by treatment status.

Either before diving in or after the first question, you may want to go over some key rules for the three designs. Pre/post requires only two observations per subject but only works with flat time trends; interrupted time series requires more than two observations per subjects and can handle any linear time trend; difference-in-difference reequires units that never get treated and requires those units to have have parallel (not necessarily linear) trends to the treated units.

For the first question, the best answer is B: there are more than two observations per subject and the trends are linear (it may not look quite linear overall, but that’s because there’s a jump at the time of treatment). Neither pre/post nor diff-in-diff works, because there is a trend and because all units receive treatment.

Based on the plot, which of these analyses will give us a good estimate of the treatment effect?

Pre/post comparison.
Interrupted time series.
Difference-in-differences.
None of the above.

01:00

Based on the plot, which of these analyses will give us a good estimate of the treatment effect?

Pre/post comparison.
Interrupted time series.
Difference-in-differences.
None of the above.

01:00

Based on the plot, which of these analyses will give us a good estimate of the treatment effect?

Pre/post comparison.
Interrupted time series.
Difference-in-differences.
None of the above.

01:00

Based on the plot, which of these analyses will give us a good estimate of the treatment effect?

Pre/post comparison.
Interrupted time series.
Difference-in-differences.
None of the above.

01:00

A statistician conducts a pre/post comparison and attempts to obtain a confidence interval for their treatment effect estimate using the bootstrap. Shown below is the original data (at left) and one of the bootstrap samples (at right).

Original Sample:

Subject	Response	Time_Period
Jimmy	1.0	Pre
Jimmy	1.5	Post
Sarita	4.0	Pre
Sarita	4.2	Post
Min	1.8	Pre
Min	2.3	Post

Bootstrap Sample:

Subject	Response	Time_Period
Jimmy	1.5	Post
Jimmy	1.5	Post
Sarita	4.0	Pre
Sarita	4.2	Post
Sarita	4.0	Pre
Min	2.3	Post

01:00

What is the problem with this way of using the bootstrap?

A. The bootstrap sample does not contain the right number of observations.

B. Some of the observations in the bootstrap sample are exact copies of each other.

C. Unique subjects in the bootstrap sample do not have one “pre” and one “post” observation each.

D. There is no problem, this is a valid use of the bootstrap.

library(tidyverse)
library(infer)
toy_example <- data.frame('Subject' = c(rep('Jimmy',2),
                                        rep('Sarita',2),
                                        rep('Min',2)),
                          'Response' = c(1.0,1.5,4.0,4.2,1.8,2.3),
                          'Time_Period' = rep(c('Pre','Post'),3))

Incorrect:

toy_example |>
  specify(response = Response,
          explanatory = Time_Period) |>
  generate(reps = 500, 
           type = 'bootstrap') |>
  calculate(stat = 'diff in means', 
            order = c('Post','Pre')) |>
  visualize()

Correct:

toy_example |>
  pivot_wider(names_from = Time_Period, 
              values_from = Response) |>
  mutate(diff = Post - Pre) |>
  specify(response = diff) |>
  generate(reps = 500, 
           type = 'bootstrap') |>
  calculate(stat = 'mean') |>
  visualize(bins=4) + xlim(-2.5,2.5)

Break

05:00

Problem Set

60:00