3 Conditional probability and independence

🥅 Goals

Know the definition of conditional probability and its properties
Have a solid knowledge of the partition theorem and Bayes’ theorem, recognizing situations where one can apply them.
Understand the concept of independence.

3.1 Conditional probability

Definition: conditional probability

For events \(A, B \subseteq \Omega\), the conditional probability of \(A\) given \(B\) is \[\mathbb{P}(A \mid B):=\frac{\mathbb{P}(A \cap B)}{\mathbb{P}(B)}\qquad \text{ whenever }\,\, \mathbb{P}(B)>0.\]

In this course, when \(\mathbb{P}(B)=0\), \(\mathbb{P}(A \mid B)\) is undefined. The usual interpretation is that \(\mathbb{P}(A \mid B)\) represents our probability for \(A\) after we have observed \(B\). Conditional probability is therefore very important for statistical reasoning, for example:

In legal trials. How can we use DNA (or other) evidence to determine the chance that an accused person is guilty?
Medical screening. How can we make best use of the information from large scale cancer screening programs?

Unfortunately, conditional probability is not always well understood. There are several well-known legal cases that have involved a serious error in probabilistic reasoning: see e.g. Example 2.4.5 of (Anderson, Seppäläinen, and Valkó 2018).

For example, if we roll a fair six-sided die, the conditional probability that the score is odd, given that the score is at most 3, is \[\mathbb{P}(\text{odd} \mid \text{ at most 3}) = \frac{\mathbb{P}(\{1,3\})}{\mathbb{P}(\{1,2,3\})} = \frac{2/6}{3/6} = \frac{2}{3}. \]

💪 Try it out

Throw three fair coins. What is the conditional probability of at least one head (event A) given at least one tail (event B)?

Answer:

Let \(H\) be the event ‘all heads’, \(T\) the event ‘all tails’. Then \(\mathbb{P}(B) = 1 - \mathbb{P}(H) = 7/8\) and \(\mathbb{P}(A \cap B) = 1 - \mathbb{P}(H) - \mathbb{P}(T) = 6/8\) so that \[\mathbb{P}(A \mid B) = \frac{\mathbb{P}(A \cap B)}{\mathbb{P}(B)} =\frac{6/8}{7/8} = \frac{6}{7}. \]

💪 Try it out

Consider a family with two children, whose sex we do not know. The possible sexes are listed by the sample space \(\Omega = \{\text{BB},\text{BG},\text{GB},\text{GG}\}\), with the eldest first. Assume that all outcomes are equally likely. Consider the events \[\begin{aligned} A_1 & = \{ \text{GG} \} = \{ \text{both girls} \}, \\ A_2 & = \{ \text{GB}, \text{BG}, \text{GG} \} = \{ \text{at least one girl} \}, \\ A_3 & = \{ \text{GB}, \text{GG} \} = \{ \text{first child is a girl} \} .\end{aligned}\] Find \(\mathbb{P}(A_1 \mid A_2)\), \(\mathbb{P}(A_2 \mid A_1)\), and \(\mathbb{P}(A_1 \mid A_3)\).

Answer:

We compute \[\begin{aligned} \mathbb{P}(A_1 \mid A_2) &= \frac{\mathbb{P}(A_1 \cap A_2)}{\mathbb{P}(A_2)} = \frac{\mathbb{P}(\{\text{GG}\})}{\mathbb{P}(\{\text{GB},\text{BG},\text{GG}\})} \\ & = \frac{1/4}{3/4} = \frac{1}{3} .\end{aligned}\]

Similarly, \[\begin{aligned} \mathbb{P}(A_2 \mid A_1) &= \frac{\mathbb{P}(A_1 \cap A_2)}{\mathbb{P}(A_1)} = \frac{\mathbb{P}(\{\text{GG}\})}{\mathbb{P}(\{\text{GG}\})} = 1,\end{aligned}\] and \[\begin{aligned} \mathbb{P}(A_1 \mid A_3) &= \frac{\mathbb{P}(A_1 \cap A_3)}{\mathbb{P}(A_3)} = \frac{\mathbb{P}(\{\text{GG}\})}{\mathbb{P}(\{\text{GB}, \text{GG}\})} = \frac{1/4}{2/4} = \frac{1}{2}. \end{aligned}\]

💪 Try it out

Consider throwing two standard dice. Consider the events \(F\) = first die shows 6, and \(T\) = total is 10. Calculate \(\mathbb{P}(F)\) and \(\mathbb{P}(F \mid T)\). Before doing any calculation, do you expect \(\mathbb{P}(F \mid T)\) to be higher or lower than \(\mathbb{P}(F)\)? (Hint: 10 is a high total. We’ll see later that the ‘average’ total score on two dice is 7.)

Answer:

The possible outcomes are ordered pairs of the numbers \(1\) to \(6\), so \(|\Omega| = 6^2 = 36\). In \(F\) are all outcomes of the form \((6,?)\). There are \(6\) of those, so \(\mathbb{P}(F) = 6/36 = 1/6\).

Now \(T = \{(6, 4), (5, 5), (4, 6)\}\) so \(F \cap T = \{(6,4)\}\), and \(\mathbb{P}(F \mid T) = (1/36)/(3/36) = 1/3 > 1/6\).

Similarly, if the total had been 5 we would know that \(F\) was impossible!

📖 Textbook references

If you want more help with this section, check out:

Section 2.2 in (Blitzstein and Hwang 2019);
Section 2.1 in (Anderson, Seppäläinen, and Valkó 2018);
or Section 2.1 in (Stirzaker 2003).

3.2 Properties of conditional probability

In this section, we’ll meet five key properties of conditional probability.

🔑 Key idea: properties of conditional probability

(P1) For any event \(B \subseteq \Omega\) for which \(\mathbb{P}(B) > 0\), \(\mathbb{P}( \,\cdot \mid B)\) satisfies axioms A1–A4 (i.e., is a probability on \(\Omega\)) and therefore also satisfies C1–C10.

For example, C6 for conditional probabilities says that, if \(\mathbb{P}(C)>0\), \[\mathbb{P}(A \cup B \mid C) = \mathbb{P}(A \mid C) +\mathbb{P}(B \mid C) - \mathbb{P}(A \cap B \mid C) .\]

🔑 Key idea: properties of conditional probability: multiplication

(P2) For any events \(A\) and \(B\) with \(\mathbb{P}(A)>0\) and \(\mathbb{P}(B)>0\), \[\mathbb{P}(A \cap B) = \mathbb{P}(B) \, \mathbb{P}(A \mid B) = \mathbb{P}(A) \, \mathbb{P}(B \mid A).\] More generally, for any \(A\), \(B\), and \(C\), \[ \mathbb{P}(A \cap B \mid C)=\mathbb{P}(B \mid C)\, \mathbb{P}(A \mid B\cap C), \qquad \text{ if } \, \mathbb{P}(B \cap C) >0 . \tag{3.1}\]

Some people refer to P2 as the multiplication rule for probabilities.

Both P1 and P2 can be deduced from the definition of probability. For example, Equation 3.1 follows from the fact that \[\mathbb{P}(B \mid C) \, \mathbb{P}(A \mid B\cap C) = \frac{\mathbb{P}(B \cap C)}{\mathbb{P}(C)} \cdot \frac{\mathbb{P}(A \cap B \cap C)}{\mathbb{P}(B \cap C)} = \frac{\mathbb{P}(A \cap B \cap C)}{\mathbb{P}(C)} = \mathbb{P}(A \cap B \mid C) .\]

💪 Try it out

Derek is playing Dungarees & Dragons. He rolls an octahedral die to generate the occupant of the room he has just entered. He knows that with probability \(3/8\) it will be a Goblin, otherwise it will be a Hobbit. A Goblin has a \(1\) in \(4\) chance of being equipped with a spiky club. What is the chance that he encounters a Goblin with a spiky club?

Answer:

Let \(G\) be the event that the occupant is a Goblin, and let \(C\) be the event that the occupant has a spiky club. We are told that \(\mathbb{P}(G) = 3/8\) and \(\mathbb{P}(C \mid G) = 1/4\), so \(\mathbb{P}(G \cap C) = \mathbb{P}(G)\mathbb{P}(C \mid G) = (3/8) \times (1/4) = 3/32\).

Our next property is a more general version of the multiplication rule.

🔑 Key idea: properties of conditional probability: multiplication (again)

(P3): For any events \(A_0,A_1,\dots,A_k\) with \(\mathbb{P}\left(\cap_{i=0}^{k-1} A_i\right)>0\), \[ \mathbb{P}\left(\bigcap_{i=1}^{k}A_i \mid A_0\right) = \mathbb{P}(A_1 \mid A_0) \times \mathbb{P}(A_2 \mid A_1\cap A_0) \times \cdots \times \mathbb{P}\left( A_{k-1} \mid \bigcap_{i=0}^{k-2}A_i \right) \times \mathbb{P}\left(A_k \mid \bigcap_{i=0}^{k-1}A_i \right). \]

When \(k=2\), we get P2; for \(k=3\), this becomes \[\mathbb{P}(A \cap B \cap C) = \mathbb{P}(A)\, \mathbb{P}(B \mid A) \, \mathbb{P}(C\mid A\cap B).\] We can prove this by repeatedly applying Equation 3.1 (in this case, we use it twice).

💪 Try it out

If Derek encounters a Goblin armed with a spiky club, the Goblin will attack, causing a wound with probability \(1/2\). A Goblin without a spiky club will flee. If Derek encounters a Hobbit, the Hobbit will offer him a cup of tea. What is the probability that Derek is wounded by this encounter?

Answer: Let \(W\) be the event that Derek is wounded. Then \[\mathbb{P}{W}= \mathbb{P}(G \cap C \cap W) = \mathbb{P}(G) \mathbb{P}(C \mid G) \mathbb{P}(W \mid C \cap G) = \frac{3}{8} \cdot \frac{1}{4} \cdot \frac{1}{2} = \frac{3}{64} .\]

🔑 Key idea: properties of conditional probability: partitions

(P4) If \(E_1, E_2, \ldots, E_k\) form a partition then, for any event \(A\), we have \[\mathbb{P}(A) = \sum_{i=1}^{k} \pr{E_i} \cpr{A}{E_i}. \tag{3.2}\] More generally, if \(\mathbb{P}(B)>0\), \[\mathbb{P}(A \mid B) = \sum_{i=1}^{k} \cpr{E_i}{B} \cpr{A}{E_i\cap B}.\]

This result is often called the partition theorem, or the law of total probability. (If you’ve forgotten what a partition is, head back to Section 1.6.)

To prove P4 is true, we first use P2 on the right-hand side of Equation 3.2 to get \[\sum_{i=1}^k \pr{E_i} \cpr{A}{E_i} = \sum_{i=1}^k \pr{A \cap E_i} .\] But since the \(E_i\) form a partition, they are pairwise disjoint, and hence so are the \(A \cap E_i\), so by C7 \[\sum_{i=1}^k \pr{A \cap E_i} = \pr{\cup_{i=1}^k (A \cap E_i ) } = \pr {A \cap (\cup_{i=1}^k E_i ) } ,\] but since the \(E_i\) form a partition, \(\cup_{i=1}^k E_i = \Omega\), giving the result. You should check that P4 remains true (with \(k=\infty\)) for infinite partitions.

💪 Try it out

Back in the land of Dungarees and Dragons, suppose that the Goblin player has a special token that he will play so that even unarmed Goblins will attack, rather than flee. An unarmed Goblin causes a wound with probability \(1/6\). Now what is the chance that Derek is wounded?

Answer:

The partition we use is \(G^\mathrm{c}\), \(G \cap C\), and \(G \cap C^\mathrm{c}\). We know from our previous examples that \(\pr{G^\mathrm{c}} = 5/8\), \(\mathbb{P}(G \cap C) = 3/32\), and \(\pr{G \cap C^\mathrm{c}} = \mathbb{P}(G) \cpr{C^\mathrm{c}}{G} = 9/32\).

We also know that \(\cpr{W}{G^\mathrm{c}} = 0\), \(\cpr{W}{G \cap C} = 1/2\), and, now, \(\cpr{W}{G \cap C^\mathrm{c}}=1/6\). So \[\begin{aligned} \mathbb{P}{W}& = \pr{G^\mathrm{c}} \cpr{W}{G^\mathrm{c}} + \mathbb{P}(G \cap C) \cpr{W}{G \cap C} + \pr{G \cap C^\mathrm{c}} \cpr{W}{G \cap C^\mathrm{c}}\\ & = 0 + \frac{3}{32} \cdot \frac{1}{2} + \frac{9}{32} \cdot \frac{1}{6} = \frac{3}{32} . \end{aligned}\]

💪 Try it out

Three machines, A, B and C, produce components. 10% of components from A are faulty, 20% of components from B are faulty and 30% of components from C are faulty. Equal numbers from each machine are collected in a packet. One component is selected at random from the packet. What is the probability that it is faulty?

Answer:

Let \(F\) be the event that the component is faulty. Let \(M_A\), \(M_B\), \(M_C\) be the events that the component is from machines A, B, C respectively. Then \(M_A\), \(M_B\), \(M_C\) form a partition so \[\begin{aligned} \pr{F} & =\pr{M_A} \cpr{F}{M_A} + \pr{M_B} \cpr{F}{M_B} + \pr{M_C} \cpr{F}{M_C} \\ & = 0.1\times \frac{1}{3} + 0.2\times \frac{1}{3} + 0.3\times \frac{1}{3} = 0.2 . \end{aligned}\]

The most important result in conditional probability is Bayes’ theorem. It allows us to express the conditional probability of an event \(A\) given \(B\) in terms of the “inverse” conditional probability of \(B\) given \(A\).

🔑 Key idea: properties of conditional probability: Bayes theorem

(P5) For any events \(A\) and \(B\) with \(\mathbb{P}(A) > 0\) and \(\mathbb{P}(B) > 0\), \[\mathbb{P}(A \mid B) = \frac{\mathbb{P}(A)\mathbb{P}(B \mid A)}{\mathbb{P}(B)}.\] More generally, if \(\mathbb{P}(A \mid C)>0\) and \(\mathbb{P}(B \mid C)>0\), then \[\cpr{A}{B\cap C} = \frac{\mathbb{P}(A \mid C)\cpr{B}{A\cap C}}{\mathbb{P}(B \mid C)}.\]

💪 Try it out

Suppose that in the previous example, the component was indeed faulty. What is the probability that it came from machine A?

Answer:

We have, by Bayes’ theorem (P5), \[\cpr{M_A}{F} = \frac{\cpr{F}{M_A}\pr{M_A}}{\pr{F}} = \frac{(1/10)(1/3)}{1/5} = \frac{1}{6} . \]

💪 Try it out

Pat ends up in the pub of an evening with probability 3/10. If she goes to the pub, she will get drunk with probability 1/2. If she stays in, she will get drunk with probability 1/5. What is the probability that she gets drunk? Given that she does get drunk, what is the probability that she went to the pub?

Answer:

Let \(P\) be the event that she goes to the pub, and \(D\) be the event that she gets drunk. Then we are told that \(\pr{P} = 3/10\), \(\cpr{D}{P} = 1/2\) and \(\cpr{D}{P^\mathrm{c}} = 1/5\). Using the partition \(P, P^\mathrm{c}\) we get \[\mathbb{P}(D) = \pr{P} \cpr{D}{P} +\pr{P^\mathrm{c}} \cpr{D}{P^\mathrm{c}} = \frac{3}{10} \cdot \frac{1}{2} + \frac{7}{10} \cdot \frac{1}{5} = \frac{29}{100} .\] Then, by Bayes’ theorem (P5), \[\cpr{P}{D} = \frac{\cpr{D}{P}\pr{P}}{\mathbb{P}(D)} = \frac{\frac{3}{10} \cdot \frac{1}{2}}{\frac{29}{100}} = \frac{15}{29} . \]

💪 Try it out

There are three regions (A,B,C) in a country with populations in relative proportions \(5:3:2\). In region A, 5% of people own a rabbit. In region B, it is 10%, and in region C, it is 15%.

i.What proportion of people nationally own rabbits? ii. What proportion of rabbit-owners come from region A?

Answer:

Let \(A, B, C\) be the events that a randomly-chosen individual comes from regions A, B, C respectively. Let \(R\) be the event that the individual is a rabbit owner. Then \[\pr{R} = \mathbb{P}(A) \cpr{R}{A} + \mathbb{P}(B) \cpr{R}{B} + \mathbb{P}(C) \cpr{R}{C} = \frac{5}{10} \cdot \frac{1}{20} + \frac{3}{10} \cdot \frac{1}{10} + \frac{2}{10} \cdot \frac{3}{20} = \frac{17}{200} .\] And, by Bayes’ theorem, \[\cpr{A}{R} = \frac{\cpr{R}{A}\mathbb{P}(A)}{\pr{R}} = \frac{\frac{5}{10} \cdot \frac{1}{20}}{\frac{17}{200}} = \frac{5}{17} . \]

💪 Try it out

One of a set of \(n\) people committed a crime. A suspect has been arrested, and DNA evidence is a match. Consider the events \(G\) = suspect is guilty, and \(E\) = DNA evidence is a match. Suppose that we initially believe that \(\mathbb{P}(G) = \alpha/n\). The probability of a ‘false positive’ DNA match is \(\cpr{E}{G^\mathrm{c}} = p\).

What is our new probability that the suspect is guilty, given the DNA evidence?

Answer:

We use the partition \(G\), \(G^\mathrm{c}\). Then, by P4, \[\begin{aligned} \pr{E} &= \cpr{E}{G}\mathbb{P}(G) + \cpr{E}{G^\mathrm{c}}\pr{G^\mathrm{c}} \\ &= 1 \times \frac{\alpha}{n} + p \times \left( 1 - \frac{\alpha}{n} \right) \\ &= \frac{\alpha + (n-\alpha)p}{n} . \end{aligned}\] Then by Bayes’ theorem (P5), \[\cpr{G}{E} = \frac{\cpr{E}{G}\mathbb{P}(G)}{\pr{E}} = \frac{\alpha/n}{(\alpha + (n-\alpha)p)/n} = \frac{\alpha}{\alpha + (n-\alpha)p} .\]

Typically: \(\alpha\approx 1\) and \(n\) is very large, and fairly easy to asses. On the other hand \(p\) is very small, and is difficult to assess as it requires a lot of information about the genetic make up of a (potentially large) group of people. When \(n\) is small it may be possible to test all of the group. A great variety of mistakes have been made in using complex evidence of this type in courts. The famous ‘prosecutor’s fallacy’ is pretending that \(\cpr{G^\mathrm{c}}{E}=\cpr{E}{G^\mathrm{c}}\) (of course this is wrong).

We can also combine properties P4 and P5 to make a mega-property of conditional expectation: Bayes’ theorem for partitions.

Theorem: properties of conditional probability: Bayes theorem for partitions

For any partition \(A_1\), …, \(A_k\) and any \(B\) with \(\mathbb{P}(B)>0\), \[\cpr{A_i}{B} = \frac{\pr{A_i}\cpr{B}{A_i}}{\sum_{j=1}^{k}\pr{A_j}\cpr{B}{A_j}}.\] More generally, if \(\mathbb{P}(B \mid C)>0\), \[\cpr{A_i}{B\cap C} = \frac{\cpr{A_i}{C}\cpr{B}{A_i\cap C}}{\sum_{j=1}^{k}\cpr{A_j}{C}\cpr{B}{A_j\cap C}}.\]

💪 Try it out

On any given day, it rains with probability \(1/2\). If it rains, Charlie the cat will go outside with probability \(1/10\); if it is dry, the probability is \(3/5\). If Charlie goes outside, what is the conditional probability that it has rained?

Answer:

Let \(R\) = it rains, \(C\) = Charlie goes outside. Then \(R, R^\mathrm{c}\) form a partition with \(\pr{R} = \pr{R^\mathrm{c}} = 1/2\). Also, \(\cpr{C}{R} = 1/10\) and \(\cpr{C}{R^\mathrm{c}} = 3/5\). So, by Bayes’ theorem (P6), \[\begin{aligned} \cpr{R}{C} & = \frac{\cpr{C}{R} \pr{R}}{\cpr{C}{R} \pr{R} + \cpr{C}{R^\mathrm{c}} \pr{R^\mathrm{c}}} \\ & = \frac{\frac{1}{10} \cdot \frac{1}{2}}{\frac{1}{10} \cdot \frac{1}{2} + \frac{3}{5} \cdot \frac{1}{2}} = \frac{1}{7} . \end{aligned}\]

📖 Textbook references

If you want more help with this section, check out:

Sections 2.3 and 2.4 in (Blitzstein and Hwang 2019);
Sections 2.1 and 2.2 in (Anderson, Seppäläinen, and Valkó 2018);
or Section 2.1 in (Stirzaker 2003).

3.3 Independence of events

Tied to the idea of conditional probability is the idea of independence: the property that two events are unrelated, or have no bearing on each other’s likelihood.

🔑 Key idea: Independence of two events

We say that two events \(A\) and \(B\) are independent whenever \[\pr{A\cap B} = \mathbb{P}(A)\mathbb{P}(B).\]

We say that two events \(A\) and \(B\) are conditionally independent given a third event \(C\) with \(\mathbb{P}(C) >0\) whenever \[\cpr{A\cap B}{C} = \mathbb{P}(A \mid C)\mathbb{P}(B \mid C).\]

For example, if we pick a card from a well-shuffled deck, the events “the card is red” (\(R\)) and “the card is an Ace” (\(A\)) are independent.

By counting, we have that \(\pr{R} = \frac{26}{52} = \frac{1}{2}\) and \(\mathbb{P}(A) = \frac{4}{52} = \frac{1}{13}\). Now, \(A \cap R = \{ A\diamondsuit, A\heartsuit\}\) so \(\pr{A \cap R } = \frac{2}{52} = \frac{1}{26}\), We check that \(\frac{1}{26} = \frac{1}{2} \cdot \frac{1}{13}\), so \(R\) and \(A\) are indeed independent.

💪 Try it out

Roll two standard dice. Let \(E\) be the event that we have an even outcome on the first die. Let \(F\) be the event that we have a 4 or 5 on the second die. Are \(E\) and \(F\) independent?

Answer:

We will verify using a counting argument.

There are 36 equally likely outcomes, namely: \[\Omega=\{(i,j)\colon i\in\{1,\dots,6\}\text{ and }j\in\{1,\dots,6\}\}\] Of those, \(3\times 6\) are in \(E\), and \(6\times 2\) are in \(F\), so \(\pr{E} = 18/36 =1/2\) and \(\pr{F} = 12/36 = 1/3\). Moreover, \(3\times 2\) of these outcomes belong to both \(E\) and \(F\), so \(\pr{E \cap F} = 6/36 = 1/6\). Indeed, \[\pr{E\cap F}=1/6=1/2\times 1/3= \pr{E} \pr{F}.\] So \(E\) and \(F\) are independent.

💪 Try it out

Roll a fair die. Consider the events \[A_1 = \{2,4,6\}, ~ A_2 = \{3,6\}, ~A_3 = \{4,5,6\}, ~\text{and}~ A_4 = \{1,2\} .\] Which pairs of events are independent?

Answer:

Note that \(A_1 \cap A_2 = \{6\}\) so \(\pr{A_1 \cap A_2} = \frac{1}{6}\), while \(\pr{A_1} \pr{A_2} = \frac{3}{6}\cdot\frac{2}{6} = \frac{1}{6}\) too. So \(A_1\) and \(A_2\) are independent.

On the other hand, \(A_1 \cap A_3 = \{4,6\}\) so \(\pr{A_1 \cap A_3} = \frac{2}{6} = \frac{1}{3}\), but \(\pr{A_1}\pr{A_3} = \frac{3}{6}\cdot\frac{3}{6}=\frac{1}{4}\). So \(A_1\) and \(A_3\) are not independent.

Never confuse disjoint events with independent events! For independent events, we have that \(\pr{A\cap B} = \mathbb{P}(A)\mathbb{P}(B)\), but for disjoint events, \(\pr{A \cap B}=0\) because \(A\cap B=\emptyset\).

Disjointness is a property of the sets only (it can be seen from the Venn diagram). Independence is a property of probabilities (it cannot be seen from the Venn diagram).

💪 Try it out

In the context of the previous example, \(A_3 \cap A_4 = \emptyset\), so \(A_3\) and \(A_4\) are disjoint. They are certainly not independent, since \(\pr{A_3 \cap A_4} = 0\) but \(\pr{A_3}\pr{A_4} = \frac{1}{2} \cdot \frac{1}{3} = \frac{1}{6} \neq 0\).

The next theorem explains why independence is called independence:

Theorem: equivalent forms for independence

Consider any two events \(A\) and \(B\) with \(\mathbb{P}(A)>0\) and \(\mathbb{P}(B)>0\). The following statements are equivalent.

\(\pr{A\cap B} = \mathbb{P}(A)\mathbb{P}(B)\).
\(\mathbb{P}(A \mid B)=\mathbb{P}(A)\).
\(\mathbb{P}(B \mid A)=\mathbb{P}(B)\).

In other words, learning about \(B\) will not tell us anything new about \(A\), and similarly, learning about \(A\) will not tell us anything new about \(B\).

For conditional independence, we have a similar result.

Theorem: equivalent forms for conditional independence

Consider any three events \(A\), \(B\), and \(C\), with \(\pr{A \cap B \cap C}>0\). The following statements are equivalent.

\(\cpr{A\cap B}{C} = \mathbb{P}(A \mid C)\mathbb{P}(B \mid C)\).
\(\cpr{A}{B\cap C}=\mathbb{P}(A \mid C)\).
\(\cpr{B}{A\cap C}=\mathbb{P}(B \mid C)\).

In other words, if we know \(C\) then learning about \(B\) will not tell us anything new about \(A\), and similarly, if we know \(C\) then learning about \(A\) will not tell us anything new about \(B\).

Consider the card-shuffling example again. The probability that our card is an Ace is \(\mathbb{P}(A) = 1/13\) and the probabilitiy that it is an Ace, given it is red, is \[\cpr{A}{R} = \frac{\pr{A\cap R}}{\pr{R}} = \mathbb{P}(A) ,\] by independence. The ‘reason’ for the independence is that the proportion of aces in the deck (\(4/52\)) is the same as that of aces among the red cards (\(2/26\)).

🔑 Key idea

It is possible for two events to be conditionally independent on particular events, but not to be (unconditionally) independent. We will see an example of this when we discuss genetics, in Section 5.2.

It can be extremely useful to recognize situations where (conditional) independence can be applied. Of course, it is equally important not to assume (conditional) independence where there really are dependencies.

Definition: independence for multiple events

A (possibly infinite) collection of events \(\mathcal{A} \subseteq \mathcal{F}\) are mutually independent if for every finite non-empty \(\mathcal{C} \subseteq \mathcal{A}\) (that is, \(\mathcal{C}\) is a finite subcollection of the events in question), \[\pr{\bigcap_{A\in\mathcal{C}}A} = \prod_{A\in\mathcal{C}}\mathbb{P}(A).\]

A collection of events \(\mathcal{A} \subseteq \mathcal{F}\) are mutually conditionally independent given another event \(B\) if for every finite non-empty subcollection \(\mathcal{C}\subseteq \mathcal{A}\), \[\cpr{\bigcap_{A\in\mathcal{C}}A}{B} = \prod_{A\in\mathcal{C}}\mathbb{P}(A \mid B).\]

The smallest case here is to consider three events. We say that the events \(A\), \(B\), and \(C\) are mutually independent if all of the following equalities are satisfied: \[\begin{aligned} \pr{A \cap B \cap C} &=\mathbb{P}(A)\mathbb{P}(B)\mathbb{P}(C), \\ \pr{A\cap B}&=\mathbb{P}(A)\mathbb{P}(B),\\ \pr{B \cap C}&=\mathbb{P}(B)\mathbb{P}(C),\\ \pr{C \cap A}&=\mathbb{P}(C)\mathbb{P}(A). \end{aligned}\]

Suppose we roll 4 dice and their values are independent.

To find the probability that we throw no sixes let \(A_i\) be the event ‘the \(i\)th throw is not a 6’. By assumption \(A_1\), …, \(A_4\) are independent so \[\pr{\text{no sixes on 4 dice}} = P\left(\bigcap_{i=1}^4A_i\right) = \prod_{i=1}^4 \pr{A_i} = \Bigl(\frac{5}{6} \Bigr)^4.\] The same result is obtained from the classical model, by selection with replacement.

It is possible for events to be pairwise independent without being mutually independent, as the next example demonstrates.

Examples: Example

Toss two fair coins. The sample space is \(\Omega=\{HH, HT, TH, TT \}\) and each outcome has probability \(1/4\).

Let \(A = \{ HH, HT\}\) be the event that the first coin comes up ‘heads’, \(B= \{HH, TH\}\) the event that the second coin comes up ‘heads’, and \(C = \{HH, TT\}\) the event that the coins come up the same.

Then since \(\mathbb{P}(A) = \mathbb{P}(B) = \mathbb{P}(C) = 1/2\) and each pairwise intersection has probability 1/4, it is easy to see that the events are pairwise independent. However, \(\pr{A \cap B \cap C } = \pr { HH } = 1/4\) which is not the same as \(\mathbb{P}(A) \mathbb{P}(B) \mathbb{P}(C) = 1/8\), so the three events are not mutually independent.

To interpret this in words, if we consider any two of the events, the occurrence of one tells us nothing about the occurrence of the other. As soon as we consider statements involving all three events, however, we see the dependence. For example, \[\cpr{C}{A \cap B} = 1 ,\] since \(A \cap B = \{HH\}\) and \(\{HH\} \subseteq C\), compared to the unconditional probability \(\mathbb{P}(C)=1/2\).

📖 Textbook references

If you want more help with this section, check out:

Section 2.5 in (Blitzstein and Hwang 2019);
Section 2.3 in (Anderson, Seppäläinen, and Valkó 2018);
or Section 2.2 in (Stirzaker 2003).

3.4 Historical context

Bayes’ theorem is named after the Reverend Thomas Bayes (1701–1761); it was published after his death, in 1763. In our modern approach to probability, the theorem is a very simple consequence of our definitions; however, the result may be interpreted more widely, and is one of the most important results regarding statistical reasoning.