Summarizing Numerical Data

STAT 20: Introduction to Probability and Statistics

Agenda

  • Announcements
  • Concept Questions and Activity
  • Problem Set 3
  • Break
  • Coding Refresher
  • Lab 1.2: Computing on the Data

Announcements

  • RQ: Data Pipelines released Friday afternoon and due Monday at 11:59pm
  • Problem Sets 2 (Tue/Wed) and 3 (today) due Tuesday at 9am
  • Lab 1: Class Survey (both parts) due Tuesday at 9am
  • Group Tutoring Sessions started yesterday in Evans Hall!
  • We slightly changed Problem 7, Lab 1.1

Concept Question

Describing Shape

Which of these variables do you expect to be uniformly distributed?

  1. bill length of Gentoo penguins
  2. salaries of a random sample of people from California
  3. house sale prices in San Francisco
  4. birthdays of classmates (day of the month)

Please vote at pollev.com.

01:00

Concept Activity - Measures of Center

Mean, median, mode: which is best?

It depends on your desiderata: the nature of your data and what you seek to capture in your summary.

Get out a piece of paper. You’ll be watching a 3 minute video that discusses characteristics of a typical human. Note which numerical summaries are used and what for.

General Advice

  1. Means are often a good default for symmetric data.
  1. Means are sensitive to very large and small values, so can be deceptive on skewed data. > Use a median
  1. Modes are often the only option for categorical data.

But there are other notions of typical… what about a maximum?

Concept Question 3 - Measures of Spread

  • Why are measures of spread so important? Consider the following question.

There are two new food delivery services that open in Berkeley: Oski Eats and Cal Cravings. A friend of yours that took Stat 20 collected data on each and noted that Oski Eats has a mean delivery time of 29 minutes and Cal Cravings a mean delivery time of 27 minutes. Which would would you rather order from?

  • Discuss this question with your classmates! (no poll question).
01:00

One possible reality

Would you still prefer to order from Cal?

Problem Set 3: Summarizing Numerical Data

Work on the problem set in groups of 2. We will discuss some questions toward the end of the period!

25:00

Break

05:00

Coding Refresher

Head to PollEverywhere for a competition!

Lab 1.2 - Computing on the Data

Work on the lab. We will discuss some questions toward the end of the period!

30:00

End of Lecture