`01:00`

STAT 20: Introduction to Probability and Statistics

- Quiz Review
- Lab 1 Review
- Concept Question
*Break*- Measures of Center
- Measures of Spread
- Summarize
*Break*- Problem Set 2.1

Which of these variables do you expect to be uniformly distributed?

- bill length of Gentoo penguins
- salaries of a random sample of people from California
- house sale prices in San Francisco
- birthdays of classmates (day of the month)

Please vote at `pollev.com`

.

`01:00`

It depends on your desiderata: the nature of your data and what you seek to capture in your summary.

Get out a piece of paper. You’ll be watching a 3 minute video that discusses characteristics of a typical human. Note which numerical summaries are used and what for.

*Means*are often a good default for symmetric data.

*Means*are sensitive to very large and small values, so can be deceptive on skewed data. > Use a median

*Modes*are often the only option for categorical data.

But there are other notions of typical…

There are two new food delivery services that open in Berkeley: Oski Eats and Cal Cravings. A friend of yours that took Stat 20 collected data on each and noted that Oski Eats has a mean delivery time of 29 minutes and Cal Cravings a mean delivery time of 27 minutes. Which would would you rather order from?

Would you still prefer to order from Cal?

You can construct a **statistical graphic** to show the **shape**, which you can describe in terms of **modality** and **skew**… you can calculate a **measure of center** to convey a sense of a typical observation…and you can calculate a **measure of spread** to capture how much variability there is in the data.

We construct tools (statistics, graphics) that produce useful summaries of raw data.

How can we express the variability in this data set using a single number?

\[ 6 \quad 7 \quad 7 \quad 7 \quad 8 \quad 8 \quad 9 \quad 9 \quad 10 \quad 11 \quad 11\]

**Desiderata**

- The statistic should be
*low*when the numbers are the same or very similar to one another. - The statistic should be
*high*when the numbers are very different. - The statistic should not grow or shrink with the sample size ( \(n\) ).

- sample size ( \(n\) ): 11
- sample mean ( \(\bar{x}\) ): 8.45
- sample median: 8
- sample mode: 7

\[ {\Large 6} \quad 7 \quad 7 \quad 7 \quad 8 \quad 8 \quad 9 \quad 9 \quad 10 \quad 11 \quad {\Large 11}\]

\[\textrm{range:} \quad max - min\]

\[ 11 - 6 = 5\]

**Characteristics**

- Very sensitive to extreme values!

\[ 6 \quad 7 \quad {\Large 7 \quad 7} \quad 8 \quad {\large 8} \quad 9 \quad {\Large 9 \quad 10} \quad 11 \quad 11\]

The difference between the median of the larger half of the sorted data set, \(Q_3\), and the median of the smaller half, \(Q_1\).

\[\textrm{IQR:} \quad Q_3 - Q_1\]

\[ 9.5 - 7 = 2.5 \]

**Characteristics**

- Robust to outliers
- Used to set the width of the box in a boxplot

\[ 6 \quad 7 \quad 7 \quad 7 \quad 8 \quad 8 \quad 9 \quad 9 \quad 10 \quad 11 \quad 11\]

Take the differences from each observation, \(x_i\), to the sample mean, \(\bar{x}\), take their absolute values, add them up, and divide by \(n\) .

\[MAD: \quad \frac{1}{n}\sum_{i = 1}^n |x_i - \bar{x}| \]

\[ MAD = 1.4 \]

**Characteristics**

- Incorporates information from all observations
- Robust to extreme values

\[ 6 \quad 7 \quad 7 \quad 7 \quad 8 \quad 8 \quad 9 \quad 9 \quad 10 \quad 11 \quad 11\]

Take the differences from each observation, \(x_i\), to the sample mean, \(\bar{x}\), square them, add them up, and divide by \(n - 1\) .

\[s^2: \quad \frac{1}{n - 1}\sum_{i = 1}^n (x_i - \bar{x})^2 \]

\[ s^2 = 2.87 \]

**Characteristics**

- Incorporates information from all observations

- Moderately sensitive to extreme values

\[ 6 \quad 7 \quad 7 \quad 7 \quad 8 \quad 8 \quad 9 \quad 9 \quad 10 \quad 11 \quad 11\]

Take the differences from each observation, \(x_i\), to the sample mean, \(\bar{x}\), square them, add them up, divide by \(n - 1\), then take the square root.

\[s: \quad \sqrt{\frac{1}{n - 1}\sum_{i = 1}^n (x_i - \bar{x})^2} \]

\[ s = 1.70 \]

**Characteristics**

- Incorporates info from all observations

- Moderately sensitive to extreme values

- Measured in units of the original data

service | range | IQR | var | sd |
---|---|---|---|---|

cal | 37.4 | 9.9 | 62.9 | 7.9 |

oski | 6.5 | 3.9 | 4.3 | 2.1 |

**Desiderata**

- The statistic should be
*low*when the numbers are the same or very similar to one another. - The statistic should be
*high*when the numbers are very different. - The statistic should not grow or shrink with the sample size ( \(n\) ).