How to Calculate Chances
Two important ideas: Conditional probabability and Independence
Sally Clark: a tragic victim of statistical illiteracy
In November 1999, Sally Clark, an English solicitor, was convicted of murdering her infant sons1. The first, Christopher, had been 11 weeks old when he died, in 1996, and the second, Harry, 8 weeks old, in January 1998, when he was found dead. Christopher was believed to have been a victim of “cot death”, called SIDS (Sudden Infant Death Syndrome) in the US. After her second baby, Harry, also died in his crib, Sally Clark was arrested for murder. The star witness for the prosecution was a well known pediatrician and professor, Sir Roy Meadow, who authored the infamous Meadow’s Law :“One sudden infant death is a tragedy, two is suspicious and three is murder until proved otherwise”2. Unfortunately it was easier to comprehend this “crude aphorism” than make the effort to understand the subtleties of conditional probability. The Royal Statistical Society protested the misuse of statistics in courts, but not early enough to prevent Sally Clark’s conviction. She was eventually acquitted and released, only to die at the age of 42 through alcohol poisoning3 The math presented by Meadow, in brief: Based on various studies, there is a probability of 1 in 8,543 of a baby dying of SIDS in a family such as the Clarks. As the Clarks suffered two deaths, Meadow multiplied 8,543 by 8,543 to arrive at 73 million. He told the jury that the chance or probability that the event of two “cot deaths” was 1 in 73 million. The defense did not employ a statistician to refute her claim, a choice that may have been disastrous for Sally Clark.
We will revisit this case at the end of these notes. Now, let’s talk about the conditional probability of an event.
Computing probabilities
In the previous set of notes, we learned about outcome spaces, events, and their probabilities. If two events are mutually exclusive \(P(A \cap B) = \emptyset\), we can compute the probability of at least one of the events occurring (\(A\cup B\)) using the addition rule \(P(A \cup B) = P(A) + P(B)\). These notes are about how we compute probabilities when two events are not mutually exclusive.
Example 1: Drawing red and blue tickets from a box
Consider a box with four tickets in it, two colored red and two blue. Except for their color, they are identical: . Suppose we draw three times at random from this box, with replacement. List all the possible outcomes. What is the probability of seeing exactly 2 red cards among our draws?
Check your answer
Note that since each of the cards is equally likely to be drawn, therefore all the sequences of three cards are equally likely. We can count the number of possible outcomes that contain exactly 2 red cards, and divide that number by the number of total possible outcomes to get the probability of drawing exactly 2 red cards:
There are three outcomes that have exactly two cards, out of a total of 8 possible outcomes, so the probability of exactly two red cards in three draws at random with replacement is 3/8.
Now suppose we repeat the procedure, but draw without replacement. What is the probability of exactly 2 red cards in 3 draws?
Check your answer
Notice that we have fewer possible outcomes (6 instead of 8, why?), though they are still equally likely. Again, there are 3 outcomes that have exactly 2 red cards, and so the probability of 2 red cards in three draws is now 3/6.
What about the probability distribution table for the number of red cards in three draws? Write down the probability distribution table for the number of red cards in three draws from a box with 2 red cards and 2 blue cards, while drawing with replacement, and then write down the probability distribution table for the same quantity (number of red cards in three draws from a box with 2 red cards and 2 blue cards), when you draw the tickets without replacement:
Check your answer
Number of reds in 3 draws | probability, with replacement | probability, without replacement |
---|---|---|
0 red tickets | \(\displaystyle \frac{1}{8}\) | \(0\) |
1 red ticket | \(\displaystyle \frac{3}{8}\) | \(\displaystyle \frac{3}{6}\) |
2 red tickets | \(\displaystyle \frac{3}{8}\) | \(\displaystyle \frac{3}{6}\) |
3 red tickets | \(\displaystyle \frac{1}{8}\) | \(0\) |
Why are the numbers different? What is going on?
Below you see an illustration of what happens to the box when we draw without replacement.
We see that the box (our access frame) reduces after each draw. After two draws, if the first 2 draws are red (as on the left most sequence) you can’t get another red ticket, whereas if you are drawing with replacement, you can keep on drawing red tickets. (Note that the outcomes in the bottom row are not equally likely, since on the left branch of the tree, blue is twice as likely as red to be the second card, so the outcome RB is twice as likely as RR, and the outcome BR on the right branch of the tree is twice as likely as BB.)
Rules of probability (recap)
Recall the rules of probability and the terms we have seen so far. Suppose we have some action such as tossing a coin, drawing from a box etc. for which we know all the possible results from this action, but we don’t know which particular one will occur on any instance of the action. This action is called a random experiment. All the possible things that can happen are called outcomes, and the collection of all the outcomes is called the sample space or outcome space \(\Omega\). If \(A\) is a subset of \(\Omega\), we call \(A\) an event. What can we say about \(A\) and \(\Omega\)?
-
\(\Omega\) is the set of all possible outcomes.
What is the probability of \(\Omega\)?
The probability of \(\Omega\) is 1. It is called the certain event. - When an event has no outcomes in it, it is called the impossible event, and denoted by \(\emptyset\).
What is the probability of the impossible event?
The probability of \(\emptyset\) is 0. - Let \(A\) be a collection of outcomes (for example, from the example above, \(A\) could be the event of two red tickets in 3 draws with replacement).
Then the probability of \(A\) has to be ______ (fill in the blank with a suitable phrase)
between 0 and 1. - If \(A\) and \(B\) are two events with no outcomes in common,
then they are called ______ (fill in the blank with a suitable phrase) .
mutually exclusive. - Consider an event \(A\). The probability of an outcome not being in \(A\) is 1 minus the probability of \(A\), since the total probability is \(1\). \(A^C\) denotes the event of not being in \(A\).
Example 2: Drawing tickets from a box to represent the number of heads in 3 tosses
If we are setting up a box for modeling the number of heads in three tosses of a fair coin, would either of the boxes below work? If not, say why not, and correct the box.
-
- Draw three times at random with replacement, and sum the draws.
-
- Draw once, the result is the number of heads.
Check your answer
The first box is correct and represents a box model for this situation. The second box doesn’t work because the tickets do not represent equally likely events. If you toss a fair coin 3 times and count the number of heads, the probability distribution of the number of heads is the same as when you draw a ticket three times at random with replacement from a box with 2 red tickets and 2 blue, and count the number of red tickets (the probability distribution above, in Example 1). If we wanted to use this example, we would have to create a different box that reflects the probability distribution of the number of heads in 3 tosses with equally likely tickets:
Conditional probabilities
In Example 1 above, we saw that the probability of a red ticket on a draw changes if we sample without replacement. If we get a blue ticket on the first draw, the probability of a red ticket on the second draw is 2/3 (since there are 3 tickets left, of which 2 are blue). If we get a red ticket on the first draw, the probability of a red ticket on the second draw is 1/3. These probabilities that depend on what happened on the first draw are called conditional probabilities. If \(A\) is the event of a blue ticket on the first draw, and \(B\) is the event of a red ticket on the second draw, we say that the probability of \(B\) given \(A\) is 2/3, which is a conditional probability, because we put a condition on the first card, that it had to be blue.
What about if we don’t put a condition on the first card? What is the probability that the second card is red?Check your answer
The probability that the second card drawn is red is 1/2, if we don’t have any information about the first card drawn. To see this, it is easier to imagine that we can shuffle all the cards in the box and they are put in some random order in which each of the 4 positions is equally likely. There are 2 red cards, so the probability that a red card will occupy any of the 4 positions, including the second, is 2/4.This kind of probability, where we put no condition on the first card, is called an unconditional probability - we don’t have any information about the first card.
The Multiplication Rule: computing the probability of an intersection
We often want to know the probability that two (or more) events will both happen: What is the probability if we roll a pair of dice, that both will show six spots; or if we deal two cards from a standard 52 card deck, that both would be kings, or in a family with two babies, both would suffer SIDS. What do we know? Recall what the Venn diagram of intersecting events looks like:
This picture tells us that \(A\cap B\) is contained in both \(A\) and \(B\), so its probability should be less than both the probabilities of \(A\) and \(B\): \(P(A\cap B) \le P(A), P(B)\). In fact, we write the probability of the intersection as:
\[P(A \cap B) = P(A) \times P(B|A)\] The second probability on the right-hand side of the equation is called the conditional probability of \(B\) given \(A\). For example, in the first example with the box with two red and two blue cards, if \(A\) is the event of drawing a red card on the first draw, and \(B\) is the event of drawing a blue card on the second draw, when we draw two cards without replacement, then we have that \(P(A) = \displaystyle \frac{2}{4}\), \(P(B | A) = \displaystyle \frac{2}{3}\) (the denominator reduces by one, since there are only 3 cards left in the box, of which 2 are blue). Therefore:
\[P(A \cap B) = P(A) \times P(B|A) = \frac{2}{4} \times \frac{2}{3} = \frac{1}{3}\] This becomes more clear if we think about the frequentist theory. We draw a red card first about half the time in the long run (if we think about drawing a card over and over again). Of those times, we would also draw a blue card second about two-thirds of the time, since a drawing a blue card would be twice as likely as drawing a red card. Therefore drawing a red card first and then a blue card would happen two thirds of one half of the time, which is about a third of the time.
Note that the roles of \(A\) and \(B\) could be reversed in the expression above:
\[ P(A \cap B) = P(A) \times P(B | A) = P(B) \times P(A | B)\] This gives us a way to compute the conditional probability:
\[ P(B | A) = \frac{ P(A \cap B)}{P(A)}, \; \; P(A) > 0 \]
Independence
We say that two events are independent if the probabilities for the second remain the same even if you know that the first has happened, no matter how the first event turns out. Otherwise, the events are dependent. That is, if \(A\) and \(B\) are independent, then we have that \(P(A | B) = P(A)\) and \(P(B |A) = P(B)\). This means that the multiplication rule reduces to:
\[ P(A \cap B) = P(A) \times P(B | A) = P(A) \times P(B) \]
You can check for independence by computing \(P(A \cap B)\) and \(P(A)\times P(B)\) and seeing if they are equal.
For example, consider our box of red and blue tickets. When we draw with replacement, the probability of a red ticket on the second draw given a blue ticket on the first draw remains at 1/2. If we had a red ticket on the first draw, the probability of the second ticket being red is still 1/2. The probability doesn’t change because it does not depend on the outcome of the first draw, since we put the ticket back.
If we draw the tickets without replacement, we have seen that the probability changes. The probability of a blue ticket on the second draw given a red ticket on the first draw is 2/3, but the probability of a red ticket on the second draw given a red ticket on the first is 1/3.
The lesson here is that when we draw tickets with replacement, the draws are independent - the outcome of the first draw does not affect the second. If we draw tickets without replacement, the draws are dependent. The outcome of the first draw changes the probabilities of the tickets for the second draw.
Example 3: Selecting 2 people out of a group of 5
(drawing without replacement)
We have a group of 5 people: Alex, Emi, Fred, Max, and Nan. Two of the five are to be selected at random to form a two person committee. Draw the box model for this situation.
Check your answer
What is the probability that Alex and Emi will be selected? Guess! (Hint: you have seen all the possible pairs above, and they are equally likely. What will be the probability of any one of them?)
We could use the multiplication rule to compute this probability, which is much simpler than writing out all the possible outcomes. The committee can consist of Alex and Emi either if Alex is drawn first and Emi second, or Emi is drawn first and Alex second. The probability that Alex will be drawn first is \(1/5\). The conditional probability that Emi will be drawn second given that Alex was drawn is \(1/4\) since there are only 4 tickets left in the box. Using the multiplication rule, the probability that Alex will be drawn first and Emi second is \((1/4) \times (1/5) = 1/20\). Similarly, the probability that Emi will be drawn first and Alex second is \(1/20\). This means that the probability that Alex and Emi will be selected for the committee is \(1/20 + 1/20 = 1/10\).
Example 4: Colored and numbered tickets
I have two boxes that with numbered tickets colored red or blue as shown below.
Are color and number independent or dependent for box 1? What about box 2?
For example, is the probability of a ticket marked 2 the same whether the ticket is red or blue?
Check your answer
For box 1, color and number are dependent, since the probability of 3 given that the ticket is red is 1/3, but the probability of 3 given that the ticket is blue is 0 (and similarly for the probability of 4).
Even though the probability for 1 or 2 given the ticket is red is the same as the probability for 1 or 2 given the ticket is blue, we say that color and number are dependent because of the tickets marked 3 or 4.
Example 5: Tickets with more than one number on them
Now I have two boxes that with numbered tickets, where each ticket has two numbers on them, as shown. For each box, are the two numbers independent or dependent. For example, if I know that the first number is 1 does it change the probability of the second number being 6 (or the other way around: if I know the second number is 6, does it change the probability of the first number being 1)?

Check your answer
For box 1, the first number and second number are independent, as shown below, using 1 and 6 as examples. If we know that the first number is 1, the box reduces as shown. The probability of the second number being 6 does not change for box 1. The probability does change for box 2, increasing from 1/2 to 2/3, since the second number is more likely to be 6 if the first number is 1.

Example 6: Back to Sally Clark
Professor Roy Meadow claimed that the probability of two of Sally Clark’s sons dying of SIDS was 1 in 73 million. He obtained this number by multiplying 8543 by 8543, using the multiplication rule, treating the two events as independent. The question is, are they really independent? Was a crime really committed? Unfortunately for Sally Clark, two catastrophic errors were committed in her case by the prosecution, and not caught by the defense. (She appealed the decision, and was acquitted and released, but after spending 4 years in prison, accused of murdering her babies.)
The first error was in treating the deaths as independent, and the second was in looking at the wrong probability. Let’s look at the first mistake. It turns out that the probability of a second child dying of “cot death” or SIDS is 1/60 given that the first child died of SIDS. This was a massive error, and it turned out that the prosecution suppressed the pathology reports for the second baby, who had a very bad infection and might have died of that. It is also believed that male infants are more likely to suffer cot death.
The second error is an example of what is called the Prosecutor’s Fallacy. They looked for the probability of the evidence if Sally Clark were innocent. They should have actually compared the probability of innocence given the evidence with the probability of murder given the evidence. These multiple errors ruined many lives. Though Sally Clark was eventually acquitted, helped by the Royal Statistical Society’s evidence, her life was shattered, and she died soon after being released. The moral of this story is to be very careful while multiplying probabilities. You must check for independence.
Mutually exclusive vs independent events
Note that if two events \(A\) and \(B\), both with positive probability, are mutually exclusive, they cannot be independent. If \(P(A \cap B) = 0\), but neither \(P(A) =0\) nor \(P(B) = 0\), then \(P(A \cap B) = 0 \ne P(A)\times P(B)\).
Intuitively, they are opposite scenarios. Independence of two events means that knowing one of the events happens gives you no information about the probability of the other, and if two events are mutually exclusive, knowing one of the events happens tells you that the other cannot happen.
Inclusion-exclusion (generalized addition rule)
Now that we know how to compute the probability of the intersection of two events, we can compute the probability of the union of two events:
\[P(A \cup B) = P(A) + P(B) - P(A \cap B) \]
You can see that if we just add the probabilities of \(A\) and \(B\), we double count the overlap. By subtracting it once, we can get the correct probability, and we know how to compute the probability of \(A\cap B\). This is known as the inclusion-exclusion principle.
Summary
In this lecture, we do a deep dive into computing probabilities. It is well known that people are just not good at estimating probabilities of events, and we saw the tragic example of Sally Clark (who, even more sadly, is not a unique case)4.
We defined conditional probability and independence, and the multiplication rule, considering draws at random with and without replacement.
We noted that independent events are very different from mutually exclusive events, and finally we learned how to compute probabilities of unions of events that may not be mutually exclusive.
Materials from class
Slides
Worksheet
Footnotes
(https://www.theguardian.com/uk-news/2021/nov/20/sally-clark-cot-death-mothers-wrongly-jailed)↩︎
From the archives of The Guardian newspaper https://www.theguardian.com/uk/2001/jul/15/johnsweeney.theobserver↩︎
The thumbnail image of the headline from the Manchester Evening News and some details of the case are from http://www.inference.org.uk/sallyclark↩︎
https://www.theguardian.com/uk-news/2021/nov/20/sally-clark-cot-death-mothers-wrongly-jailed↩︎