2  Equally likely outcomes and counting principles

🥅 Goals
  1. Understand the equally likely outcomes model of classical probability.
  2. Know counting principles, and when and how to apply them on specific problems.

In Chapter 1 we have seen the abstract formulation of probability theory; next we turn to the question of how the probabilities themselves may be assigned.

The most basic scenario occurs when our experiment has a finite number of possible outcomes which we deem to be equally likely.

Such situations rarely—not to say never—occur in practice, but serve as good models in extremely controlled environments such as in gambling or games of chance. However, this situation (which will essentially come down to counting) gives us a good initial setting in which to obtain some very useful insights into the nature and calculus of probability.

2.1 Classical probability

Suppose that we have a finite sample space \(\Omega\). Since \(\Omega\) is finite, we can list it as a collection of \(m = |\Omega|\) possible outcomes: \[\Omega = \{ \omega_1, \ldots, \omega_m \}.\] In the equally likely outcomes model (also sometimes known as classical probability) we suppose that each outcome has the same probability: \[ \mathbb{P}(\omega) = \frac{1}{|\Omega |} \text{ for each } \omega \in \Omega ,\] or, in the notation above, \(\mathbb{P}(\omega_i) = 1/m\) for each \(i\).

The axioms of probability then allow us to determine the probability of any event \(A \subseteq \Omega\): by C7, \[ \mathbb{P}(A) = \sum_{\omega \in A} \mathbb{P}(\omega) = \frac{|A|}{|\Omega|} ~\text{for any event } A \subseteq \Omega .\] This is a particular case of the discrete sample space discussed in Chapter 1.

Definition: Equally likely outcomes
Consider a scenario with \(m\) equally likely outcomes enumerated as \(\Omega = \{\omega_1,\dots,\omega_m\}\). In the equally likely outcomes model, the probability of an event \(A\subseteq\Omega\) is declared to be \[ \mathbb{P}(A):= \frac{|A|}{|\Omega|}. \]

Using this definition, we meet all of the axioms (A1–A4) (checking each of them comes down to what we know about counting). Remember that in the case of a finite state space, we always have the option to take \(\mathcal{F} = 2^{\Omega}\) as our \(\sigma\)-algebra.

Examples
  1. Draw a card at random from a well-shuffled pack, so that each of the 52 cards is equally likely to be chosen. Typical events are that the card is a spade (a set of 13 outcomes), the card is a queen (a set of 4 outcomes), the card is the queen of spades (a set of a single outcome). In the equally likely outcomes model, the probability of drawing the queen of spades (or any other specified card) is 1/52, the probability of drawing a spade is 13/52, and the probability of drawing a queen is 4/52.

  2. Flip a coin and see whether it falls heads or tails, each assumed equally likely; then ‘heads’ or ‘tails’ each has probability 1/2.

  3. Roll a fair cubic die to get a number from 1 to 6. Here the word ‘fair’ is used to mean each outcome is equally likely. Then \(\Omega = \{1,\ldots,6\}\) and \(\mathbb{P}(A) = |A|/6\). For example, if \(A_1 = \{ 2\}\) (the score is 2) we get \(\mathbb{P}(A_1) = 1/6\), while if \(A_2 = \{1,3,5\}\) (the score is odd) we get \(\mathbb{P}(A_2) = 3/6 = 1/2\).

  4. If we roll a pair of fair dice then outcomes are pairs \((i, j)\) so there are 36 possible outcomes. If we assume that the outcomes are equally likely, then the probability of getting a pair of 6’s is 1/36, for example.

The classical interpretation of probability is the most straightforward approach we can take, just as counting can be seen as “basic” mathematics. It is a good place to start and there are many important situations where intuitively it seems reasonable to say that each outcome of a particular collection is equally likely.

To extend the theory or apply it in practice we have to address situations where there are no candidates for equally likely outcomes or where there are infinitely many possible outcomes and work out how to find probabilities to put into calculations that give useful predictions. We will come back to some of these issues later; but bear in mind that however we come up with our probability model, the same system of axioms that we saw in Chapter 1 applies.

📖 Textbook references

If you want more help with this section, check out:

2.2 Counting principles

Given a finite sample space and assuming that outcomes are equally likely, to determine probabilities of certain events comes down to counting.

For example, in drawing a poker hand of five cards from a well-shuffled deck of 52 cards, the probability of having a ‘full house’ (meaning two cards of one denomination and three of another, e.g., two Kings and three 7s) is given by the number of hands that are full houses divided by the total number of hands (each hand being equally likely).

These counting problems need careful thought, and we will describe some counting principles for some of the most common situations. There is some common ground with the Discrete Maths course; here we have a slightly different emphasis.

2.2.1 The multiplication principle

Counting principle: Multiplication
Suppose that we must make \(k\) choices in succession where there are:

  • \(m_1\) possibilities for the first choice,

  • \(m_2\) possibilities for the second choice,

  • \(\quad\vdots\)

  • \(m_k\) possibilities for the \(k\)th choice,

and the number of possibilities at each stage does not depend on the outcomes of any previous choices. The total number of distinct possible selections is \[m_1 \times m_2 \times m_3 \times \cdots \times m_k = \prod_{i=1}^{k}m_i.\]

For instance, in a standard deck of playing cards, each card has a denomination and a suit. There are 13 possible denominations: A(ce), 2, 3, …, 10, J(ack), Q(ueen), K(ing). There are 4 possible suits: \(\heartsuit\) (heart), \(\diamondsuit\) (diamond), \(\clubsuit\) (club), \(\spadesuit\) (spade). Because all combinations of denomination and suit are allowed, the multiplication principle applies: there are \(13 \times 4 = 52\) cards in a standard deck.

We will see many applications of counting to dealing cards from a well-shuffled deck. Counting the possibilities is affected by (i) whether the order of dealing is important, and (ii) how we distinguish the cards: e.g. we may only be interested in their colour (so all red cards are the same) or their suit or their denomination.

Examples
  1. A hotel serves 3 choices of breakfast, 4 choices of lunch and 5 choices of dinner so a guest selects from \(3 \times 4 \times 5\) different combinations of the three meals (assuming we opt to have all three).

  2. A coffee bar has 5 different papers to choose from, 19 types of coffee and 7 different snacks. This means there are \(6\times 20\times 8 = 960\) distinct selections of coffee, snack and paper. Of these 5 involve no coffee or snack (which the staff may object to) plus one has no coffee, snack or paper!

  3. PINs are made up of 4 digits (0–9) with the exceptions that (i) they cannot be four repetitions of a single digit; (ii) they cannot form increasing or decreasing consecutive sequences, e.g. 3456 and 8765 are excluded. How many possible four-digit PINs are there?

    Ignoring restrictions there are \(10^4=10,000\) distinct PINs. There are \(10\) PINs with the same digit repeated, namely \(0000\), \(1111\), …, \(9999\). Increasing sequences start with \(0,1,2,...,6\) and decreasing sequences start with \(9,8,...,3\), so there are seven options for each. This leaves \(10,000 - 24 = 9,976\) permitted PINs.

All of the following counting principles are effectively consequences of the multiplication principle.

2.2.2 Order matters; objects are distinct

First, we look at ordered choices of distinct objects. In this case, we distinguish between selection with replacement, where the same object can be selected multiple times, and selection without replacement, where each object can only be selected at most once.

Counting principle: Selection with replacement for ordered choices
Suppose that we have a collection of \(m\) distinct objects and we select \(r\) of them with replacement. The number of different ordered lists (ordered \(r\)-tuples) is \[\underbrace{m \times \cdots \times m}_{r \text{ times}} = m^r.\]

Counting principle: Selection without replacement for ordered choices
Suppose that we have a collection of \(m\) distinct objects and we select \(r \leq m\) of them without replacement. The number of different ordered lists (ordered \(r\)-tuples) is \[(m)_r := \underbrace{m \times (m-1) \times (m-2) \times \cdots \times (m-r+1)}_{r \text{ terms}} = \frac{m!}{(m-r)!} .\]

The falling factorial notation \((m)_r\) (sometimes also denoted \(m^{\underline{r}}\)) is simply a convenient way to write \(\frac{m!}{(m-r)!}\). In the special case where \(r = m\) we set \(0! = 1\) and then \((m)_m = m!\) is the number of permutations of the \(m\) objects. If \(m\) is large, and \(r\) is much smaller than \(m\), then \((m)_r \approx m^r\).

Example
The number of ways we can deal out four cards in order from a pack of cards is \((52)_4\) and the number of ways we can arrange the four aces in order is \(4!\) so the probability of finding the four aces on top of a well-shuffled deck is \[\frac{4!}{(52)_4} =\frac{4 \times 3 \times 2 \times 1}{52 \times 51 \times 50 \times 49}.\] This probability is approximately \(3.7 \times 10^{-6}\) or about \(1\) in \(270,000\).

💪 Try it out

There are \(n < 365\) people in a room. Let \(B\) be the event that (at least) two of them have the same birthday. (We ignore leap years.)

What is \(\mathbb{P}(B)\)? How big must \(n\) be so that \(\mathbb{P}(B) > 1/2\)?

Answer:

Here the equally likely outcomes are the ordered length-\(n\) lists of possible birthdays: \[(\text{person 1's birthday}, \text{person 2's birthday}, \ldots, \text{person $n$'s birthday}) .\] The number of possible outcomes is \[365 \times 365 \times \cdots \times 365 = 365^n .\] This is the denominator in our probability.

For the numerator, we must work out how many outcomes are in \(B\). In fact, it is easier to count outcomes in \(B^\mathrm{c}\), where everyone has a different birthday. There are \[365 \times 364 \times \cdots \times (365-n+1) = (365)_n\] of these. So \[\mathbb{P}(B) = 1 -\mathbb{P}(B^\mathrm{c}) = 1 - \frac{(365)_n}{365^n} .\] It turns out that \(\mathbb{P}(B) \approx 1/2\) for \(n=23\).

Advanced content
Here’s one way to get this. Note that \[\frac{(365)_n}{365^n} = 1 \times \left(1 - \frac{1}{365}\right) \times \left(1 - \frac{2}{365}\right) \times \cdots \times \left(1 - \frac{n-1}{365}\right).\] Now \(1 - x \leq e^{-x}\), and in fact the inequality is very close to equality for \(x=1/365\), being close to zero. In any case, \[\begin{aligned} \frac{(365)_n}{365^n} & \leq e^{-x}\,e^{-2x}\,e^{-3x}\cdots\, e^{-(n-1)x} \\ &=\exp\left\{ -(1+2 + \cdots + n-1)x \right\} \\ &=\exp\left\{ -\frac{(n-1)n}{2\times 365} \right\}. \end{aligned}\]

2.2.3 Order doesn’t matter; objects are distinct

In this section, we move on to think about the scenario where the order in which objects are selected doesn’t matter. This can arise in situations such as dealing a hand of cards, or separating a class into two teams.

Counting principle: Selection without replacement for unordered choices
Suppose that we have a collection of \(m\) distinct objects and we select a subset of \(r \leq m\) of them without replacement. The number of distinct subsets of size \(r\) is \[\binom{m}{r} := \frac{(m)_r}{r!}=\frac{m!}{r!\,(m-r)!}.\]

To see this, first count the number of distinct ordered lists of \(r\) objects—this is \((m)_r\). Each unordered subset has been counted \((r)_r=r!\) times as this is the number of distinct ways of arranging \(r\) different objects. Therefore the \((m)_r\) ordered selections can be grouped into collections of size \(r!\), each representing a particular subset, and the result follows by dividing.

The expression \(\binom{m}{r}\) is the binomial coefficient for choosing \(r\) objects from \(m\) and is often called ‘\(m\)-choose-\(r\)’. Note that \[\binom{m}{r} = \binom{m}{m-r}\] as we can choose to take \(r\) objects from \(m\) in exactly the same number of ways that we can choose to leave behind \(r\) objects i.e., take \(m-r\) objects.

💪 Try it out
What is the probability of finding no aces in a four-card hand dealt from a well-shuffled deck?

Answer:

Let’s answer this by treating hands as unordered selections. Then there are \[\binom{52}{4} = \frac{52 \times 51 \times 50 \times 49}{4 \times 3 \times 2 \times 1} = 270, 725\] distinct unordered hands of four cards. The number of these with no aces is \[\binom{48}{4} = \frac{ 48\times 47 \times 46 \times 45}{4 \times 3 \times 2 \times 1} = 194, 580 ,\] and so the probability of finding no aces in a four card hand is \[\binom{48}{4} \bigg/ \binom{52}{4} =\frac{48\times 47 \times 46 \times 45}{52 \times 51 \times 50 \times 49} \approx 0.7187.\]

Alternatively, we could answer this by treating the hands as ordered selections (the order corresponding to the order of the deal, say). Of course, this will give different numerator and denominator in our calculation, but the final answer must be the same! As ordered selections, there are \[(52)_4 = 52 \times 51 \times 50 \times 49\] distinct hands. The number of these with no ace is \[(48)_4 = 48\times 47 \times 46 \times 45.\] Our probability is then \((48)_4/(52)_4\) which is the same as before.

In this simple example, either method is relatively straightforward, but in many examples, it is much more natural to treat the selections as ordered/undordered. For hands of cards, treating them as unordered selections usually works best. For something like rolling dice, it usually makes sense to treat them as ordered selections.

💪 Try it out
You are dealt five cards from a well-shuffled deck. Let \(A\) be the event that exactly four cards are of the same suit. What is \(\mathbb{P}(A)\)?

Answer:

There are \(\binom{52}{5}\) different unordered selections for the hand, and all are equally likely. How many of these unordered selections are in \(A\)? We need to describe a subset of 5 elements such that exactly 4 have the same suit. We build this up sequentially:

  • We first choose the suit that we are going to use for the four cards: 4 possibilities.

  • Then we choose the four denominations (unordered) for those cards: \(\binom{13}{4}\) possibilities.

  • All that remains is to choose the last card, which must be of a different suit than the four already chosen: \(3 \times 13 = 39\) possibilities.

So the answer is \[\mathbb{P}(A) = \frac{4 \times \binom{13}{4}\times 39}{\binom{52}{5}} \approx 0.0429.\]

💪 Try it out
In ‘Lotto Extra’ you have to select 6 numbers from 1 to 49. You win the big prize if 6 randomly drawn numbers match your selection. Let \(W\) be the event that you win. Let \(M_4\) be the event that you match exactly 4 out of 6 numbers. Find the probabilities of \(W\) and \(M_4\).

Answer:

We model the outcomes of the Lotto draw as unordered selections, so there are \(\binom{49}{6} = 13,983,816\) outcomes in total. The event \(W\) contains only one of them (your entry)! So \(\mathbb{P}(W)=1/13,983,816\).

Now \(M_4\) uses any 4 of your numbers plus any 2 of the remaining \(49-6 = 43\) numbers. So the number of outcomes in \(M_4\) is \[\binom{6}{4} \times \binom{43}{2} = \frac{6\times 5}{2\times 1}\times\frac{43\times 42}{2\times 1} =15\times 43\times 21 = 13,545.\] Then \(\mathbb{P}(M_4)=13,545/13,983,816\approx 0.001\).

Advanced content
The same counting arguments can be used when we need to divide \(m\) objects into \(k > 2\) groups: arranging \(m\) distinguishable objects into \(k\) groups with sizes \(r_1\), \(\ldots\,\), \(r_k\) where \(r_1 + \cdots + r_k = m\) can be done in \[\binom{m}{r_1,\,\ldots\,, r_k}:=\frac{m!}{r_1! \,\cdots\, r_k!}\] ways. The expression \(\binom{m}{r_1,\,\ldots\,, r_k}\) is called the multinomial coefficient (Anderson, Seppäläinen, and Valkó 2018 Example 6.7).

2.2.4 Separating objects into groups

In the final section of this chapter, we look into how we can group objects: either by combining different types of onect into one big group, or by separating a big group into smaller ones.

Counting principle: Two types of object
Suppose that we have \(m\) objects, \(r\) of type 1 and \(m-r\) of type 2, where objects are indistinguishable from others of their type. The number of distinct, ordered choices of the \(m\) objects is \[\binom{m}{r}.\]

For example, suppose we have four red tokens, and three black ones. Then there are \(7!/(4!\,3!) = 35\) different ways to lay them out in a row. The probability that they will be alternately red and black is \(1/35\) as there is only one such ordering.

To see why, note that each distinct order for laying out all of the token in a row is precisely the same as choosing 4 of the 7 positions for red ones. In other words, it is an unordered choice of 4 positions from the 7 distinct positions.

💪 Try it out
A coin is tossed 7 times. Let \(E\) be the event that a total of 3 heads is obtained. What is \(\mathbb{P}(E)\)?

Answer:

Consider ordered sequences of H and T: then there are \(2^7 = 128\) possible sequences, e.g. HTHTHTT. How many of them are in \(E\)? We choose the 3 places where H occurs: \(\binom{7}{3} = 35\) ways to do this. The other places are taken by Ts. So the answer is \[\mathbb{P}(E) = \frac{35}{128} .\]

Advanced content
More generally, using the positions argument again, the multinomial coefficient is the number of ordered choices of objects with \(k\) types, \(r_i\) of type \(i\), which are indistinguishable within each type.

Counting principle: Separating into groups
The number of ways to divide \(m\) indistinguishable objects into \(k\) distinct groups is \[\binom{m+k-1}{m} = \binom{m+k-1}{k-1}.\]

This counting principle lets us work out how many different ways there are to divide one group into smaller groups. My favourite example is a packet of Skittles: if there are 16 or 17 of them in a bag, how many different combinations of the five different flavours could we have?

We can count the number of choices with the ‘sheep-and-fences’ method. Placing all the objects in a line, separated into their groups, there are \(k-1\) “fenceposts” between the \(k\) groups of sheep (or Skittles).

For example, with 6 objects in 4 groups, we could represent “three in group A, one in group B, none in group C and two in group D” with the drawing \(***\mid *\mid \mid **\).

We draw \(m+k-1\) ‘things’ in total (stars and fences). This means that the number of groupings of the objects is the same as the number of choices for the locations of the \(k-1\) fences among the \(m+k-1\) ‘things’, or \(\binom{m+k-1}{k-1} = \binom{m+k-1}{m}\).

📖 Textbook references

If you want more help with this section, check out:

2.3 Historical context

Classical probability theory originated in calculation of odds for games of chance; as well as contributions by Pierre de Fermat (1601–1665) and Blaise Pascal (1623–1662), comprehensive approaches were given by Abraham de Moivre (1667–1754) (Moivre 1756), Laplace (1749–1827) (Laplace 1825), and Sim'eon Poisson (1781–1840). A collection of these classical methods made just before the advent of the modern axiomatic theory can be found in (Whitworth 1901).