Expected Value and Variance

STAT 20: Introduction to Probability and Statistics

Agenda

• Concept review
• Concept questions
• A new geometry: geom_col()
• PS 10
• Break
• Lab 4 slides
• Lab 4

Concept Review

Let $X$ be a random variable such that $X = \begin{cases} -1, & \text{ with probability } 1/3\\ 0, & \text{ with probability } 1/6\\ 1, & \text{ with probability } 4/15 \\ 2, & \text{ with probability } 7/30 \\ \end{cases}$

1. Draw the graph of the cdf of $X$
08:00
1. Compute the expected value and variance of $X$

Concept Questions

01:00

$X$ is a random variable with the distribution shown below:

$X = \begin{cases} 3, \; \text{ with prob } 1/3\\ 4, \; \text{ with prob } 1/4\\ 5, \; \text{ with prob } 5/12 \end{cases}$

Consider the box with tickets: $\fbox{3}\, \fbox{3}\, \fbox{3} \,\fbox{4} \,\fbox{4} \,\fbox{4} \,\fbox{4} \,\fbox{5} \,\fbox{5}\, \fbox{5} \,\fbox{5} \,\fbox{5}$

Suppose we draw once from this box and let $Y$ be the value of the ticket drawn. Which random variable has a higher expectation?

The expected value of $X$ is ____ the expected value of $Y$.

Evgeni bakes a cake and invites two friends to join him in eating it. The cake will be evenly split between whoever shows up and Evgeni. His friends know that Evgeni is an erratic baker at best, and independently each toss a coin to decide whether they will go or not. Let $X$ be the fraction of the cake that Evgeni gets to eat. For example if 1 friend shows up, Evgeni gets half the cake, if 2 show up, he gets one-third of the cake. What is $E(X)$?

01:00

A die will be rolled $n$ times and the object is to guess the total number of spots in $n$ rolls, and you choose $n$ to be either 50 or 100. There is a one-dollar penalty for each spot that the guess is off. For instance, if you guess 200, and the total is 215, then you lose 15 dollars. Which do you prefer? 50 throws, or 100?*

Which do you prefer? $n = 50$ rolls, or $n = 100$ rolls?

* From the text Statistics by Freedman, Pisani, and Purves

02:00

One hundred draws will be made with replacement from a box with tickets $\fbox{0}\, \fbox{2}\, \fbox{3} \,\fbox{4} \,\fbox{6}$. Which of the following statements are true? *

• The expected value of the sum of the one hundred draws is 300, give or take 20 or so.
• The expected value of the sum of the one hundred is 300.
• The sum of the one hundred draws is 300, give or take 20 or so.
• The sum of the one hundred draws is 300.

* From the text Statistics by Freedman, Pisani, and Purves

01:00

We have two random variables: $X \sim$ Binomial($10, 0.2$) and $Y$ is the random variable that is the value of one ticket drawn at random from a box with tickets $\fbox{0}\, \fbox{2}\, \fbox{3} \,\fbox{4} \,\fbox{6}$.

We take the sum of 100 iid random variables for each of $X$ and $Y$, called $SX_{100}$ and $SY_{100}$. The empirical distributions of $SX_{100}$ and $SY_{100}$ are plotted below.

Which distribution belongs to which random variable?

PS 10

20:00

Break

05:00

Visualizing probability distributions

Recall geom_bar(). What does it do? What aesthetics do we need to define?

ggplot(penguins, aes(x = species)) +
geom_bar(fill = "steelblue4") + theme_minimal() 

What about if we don’t want to count instances but want to plot probabilities?

Let $X$ be the number of heads when we toss a fair coin three times. We know that:

$X = \begin{cases} 0, & \text{ with probability } 1/8\\ 1, & \text{ with probability } 3/8\\ 2, & \text{ with probability } 3/8 \\ 3, & \text{ with probability } 1/8 \\ \end{cases}$ To plot this, we use a special kind of bar chart, plotted using geom_col(), in which the bar heights represent values supplied by the data. We have to specify a y aesthetic, a column from the data. The values from this column will be the heights of the bars.

x <- c(0, 1, 2, 3)
fx <- c(1/8, 3/8, 3/8, 1/8)
coin3_df <- data.frame(x, fx)
coin3_df %>% ggplot(aes(x = factor(x), y = fx)) +
geom_col(fill = "goldenrod2") + ???