# Summarizing Numerical Data

STAT 20: Introduction to Probability and Statistics

## Agenda

• Announcements
• Concept Questions and Activity
• Problem Set 3
• Break
• Coding Refresher
• Lab 1.2: Computing on the Data

## Announcements

• RQ: Data Pipelines released Friday afternoon and due Monday at 11:59pm
• Problem Sets 2 (Tue/Wed) and 3 (today) due Tuesday at 9am
• Lab 1: Class Survey (both parts) due Tuesday at 9am
• Group Tutoring Sessions started yesterday in Evans Hall!
• We slightly changed Problem 7, Lab 1.1

# Concept Question

## Describing Shape

Which of these variables do you expect to be uniformly distributed?

1. bill length of Gentoo penguins
2. salaries of a random sample of people from California
3. house sale prices in San Francisco
4. birthdays of classmates (day of the month)

Please vote at pollev.com.

01:00

# Concept Activity - Measures of Center

## Mean, median, mode: which is best?

It depends on your desiderata: the nature of your data and what you seek to capture in your summary.

Get out a piece of paper. You’ll be watching a 3 minute video that discusses characteristics of a typical human. Note which numerical summaries are used and what for.

1. Means are often a good default for symmetric data.
1. Means are sensitive to very large and small values, so can be deceptive on skewed data. > Use a median
1. Modes are often the only option for categorical data.

But there are other notions of typical… what about a maximum?

# Concept Question 3 - Measures of Spread

• Why are measures of spread so important? Consider the following question.

There are two new food delivery services that open in Berkeley: Oski Eats and Cal Cravings. A friend of yours that took Stat 20 collected data on each and noted that Oski Eats has a mean delivery time of 29 minutes and Cal Cravings a mean delivery time of 27 minutes. Which would would you rather order from?

• Discuss this question with your classmates! (no poll question).
01:00

## One possible reality

Would you still prefer to order from Cal?

# Problem Set 3: Summarizing Numerical Data

Work on the problem set in groups of 2. We will discuss some questions toward the end of the period!

25:00

# Break

05:00

# Coding Refresher

Head to PollEverywhere for a competition!

# Lab 1.2 - Computing on the Data

Work on the lab. We will discuss some questions toward the end of the period!

30:00