# Case Study: Pricing Homes

STAT 20: Introduction to Probability and Statistics

## Agenda

• Concept Questions
• Problem Set 7.2
• Break
• Lab 7.2

# Concept Questions

Consider two houses for sale, both 1,100 sqft, 2 bedroom, 1 bathroom, with a small garage, but one is in Santa Monica and the other is in Westwood. Which is true of the predicted sale prices of these two homes?


Call:
lm(formula = log_price ~ log_sqft + city, data = LA)

Coefficients:
(Intercept)          log_sqft    cityLong Beach  citySanta Monica
5.46554           1.15119          -0.89345          -0.09301
cityWestwood
-0.45846  
01:00

## A simple model for price

m4 <- lm(log_price ~ bed, data = LA)

What do you expect the sign of the coefficient for bed to be?

01:00

## A simple model for price

m4 <- lm(log_price ~ bed, data = LA)

What do you expect the sign of the coefficient for bed to be?

# A tibble: 2 × 5
term        estimate std.error statistic   p.value
<chr>          <dbl>     <dbl>     <dbl>     <dbl>
1 (Intercept)   11.8      0.0436     271.  0
2 bed            0.532    0.0142      37.3 9.77e-220

## A less simple model for price

m5 <- lm(log_price ~ bed + log_sqft, data = LA)

What do you expect the sign of the coefficient for bed and log_sqft to be?

01:00

## A less simple model for price

m5 <- lm(log(price) ~ bed + log_sqft, data = LA)

What do you expect the sign of the coefficient for bed and log_sqft to be?

# A tibble: 3 × 5
term        estimate std.error statistic   p.value
<chr>          <dbl>     <dbl>     <dbl>     <dbl>
1 (Intercept)    1.47     0.218       6.73 2.28e- 11
2 bed           -0.123    0.0164     -7.46 1.46e- 13
3 log_sqft       1.66     0.0346     47.8  2.60e-310

What is the relationship between bed and log_price?

What is the relationship between log_sqft and log_price?

What is the relationship between log_sqft and log_price, controlling for bed?

What is the relationship between bed and log_price, controlling for log_sqft?

Simpson’s paradox, which also goes by several other names, is a phenomenon in probability and statistics in which a trend appears in several groups of data but disappears or reverses when the groups are combined.

# What can we build with data?

A prediction machine.

A summary of a data set.

A generalization to a population.

A causal explanation.

## Model Interpretation

Question 1 What is the relationship between the number of bedrooms in a house and its price?

$\widehat{\textrm{log(price)}} = 11.8 + .53 \textrm{bed}$

Question 2 After controlling for the size of a house, what is the relationship between the number of bedrooms in a house and its price?

$\widehat{\textrm{log(price)}} = 11.8 + -0.12 \textrm{bed} + 1.66 \textrm{log(sqft)}$

## The Tradeoff between flexility and interpretability

Fig 2.7 from An Introduction to Statistical Learning with R by James, Witten, Hastie, and Tibshirani.

# Problem Set 7.2

20:00

This address was the property acquired by the UC to serve as the home of the president of the system (you can google it and pull up news articles).

# Lab 7.2 Work

25:00