Causal Effects in Observational Studies

STAT 20: Introduction to Probability and Statistics

Agenda

  • Announcements
  • Concept Questions
  • Problem Set 19

Announcements

  • PS 19 and PS 20 both due Tuesday 4/29 at 9:00 AM
  • Final exam review sessions:
    • Summarization: 12pm-1pm Monday 4/29, Stanley 105
    • Causality: 3pm-4pm Monday 4/29, Stanley 105
    • Generalization: 3pm-4pm Wednesday 5/1, VLSB 2050
    • Probability: 4pm-5pm Wednesday 5/1, VLSB 2050
    • Prediction: 3pm-4pm Friday 5/3, Stanley 105
  • Final exam: 7pm-10pm, Thursday 5/9, room TBA.
  • Please fill out course evals!

Concept Questions

To study the impact of receiving permanent resident status on mental health, we compare answers to a psychiatric survey from people who entered and won the US green card lottery to answers from others who entered but did not win.

What kind of study is this?

  1. A randomized trial.
  2. A natural experiment.
  3. An observational study that requires matching.
  4. None of the above.
01:00

To study the impact of childhood trauma on later academic performance, we compare GRE scores for students who lost a close family member in an automobile accident before the age of 8 to GRE scores for students who did not lose a close family member before age 8.

What kind of study is this?

  1. A randomized trial.
  2. A natural experiment.
  3. An observational study that requires matching.
  4. None of the above.
01:00

To study the effectiveness of a blood pressure medication, we enroll 500 patients. We take the blood pressure of all patients before anyone receives medication. We assign the 200 patients with the highest blood pressure readings to get the medication, assigning the others to be controls.

What kind of study is this?

  1. A randomized trial.
  2. A natural experiment.
  3. An observational study that requires matching.
  4. None of the above.
01:00

In the next slide, you will see the first few rows of a dataset containing demographic information on California counties. Scroll to see all of the rows.

We are interested in determining whether a difference in median_edu has a causal effect on homeownership using matching. Which county serves as the best counterfactual match to Fresno County?

  1. Kern County
  2. Alameda County
  3. Contra Costa County
  4. Shasta County
  5. Del Norte County
02:00

name homeownership median_edu metro smoking_ban
Fresno County 55.0 some_college yes none
Colusa County 64.4 hs_diploma no none
Del Norte County 60.9 hs_diploma no none
Alameda County 55.1 some_college yes none
Contra Costa County 69.5 some_college yes partial
Glenn County 67.5 hs_diploma no none
Shasta County 66.0 some_college yes none
Kern County 61.4 hs_diploma yes none
San Luis Obispo County 61.4 some_college yes none

In this table there are nine counties, five with some_college values for median_edu and four with hs_diploma values.

How many counties of each type will remain after we conduct optimal matching on metro and smoking_ban?

  1. some_college: 4, hs_diploma: 4.
  2. some_college: 5, hs_diploma: 4.
  3. some_college: 2, hs_diploma: 2.
  4. some_college: 2, hs_diploma: 4.
  5. Can’t tell without more information.
01:00

Which R command correctly performs matching on covariates to measure the impact of median_edu on homeownership?

  1. matchit(homeownership ~ median_edu, data = county, method = ‘optimal’, distance = ‘euclidean’)
  2. matchit(median_edu ~ homeownership, data = county, method = ‘optimal’, distance = ‘euclidean’)
  3. matchit(median_edu ~ metro + smoking_ban, data = county, method = ‘optimal’, distance = ‘euclidean’)
  4. matchit(homeownership ~ median_edu + metro + smoking_ban, data = county, method = ‘optimal’, distance = ‘euclidean’)
01:00

Assuming that metro and smoking_ban variables are the only ones we have measured, name an unmeasured variable that could introduce confounding between median_edu and homeownership.

02:00

Break

05:00

Problem Set

60:00