Consider two houses for sale, both 1,100 sqft, 2 bedroom, 1 bathroom, with a small garage, but one is in Santa Monica and the other is in Westwood. Which is true of the predicted sale prices of these two homes?

What is the relationship between bed and log_price?

What is the relationship between log_sqft and log_price?

What is the relationship between log_sqft and log_price, controlling for bed?

What is the relationship between bed and log_price, controlling for log_sqft?

Simpson’s Paradox

Simpson’s paradox, which also goes by several other names, is a phenomenon in probability and statistics in which a trend appears in several groups of data but disappears or reverses when the groups are combined.

Source: Wikipedia

What can we build with data?

A prediction machine.

A summary of a data set.

A generalization to a population.

A causal explanation.

Question 1 What is the relationship between the number of bedrooms in a house and its price?