$$ \newcommand{\pr}[1]{\mathbb{P}\left(#1\right)} \newcommand{\cpr}[2]{\mathbb{P}\left(#1\mid\,#2\right)} $$
2 Equally likely outcomes and counting principles
In Chapter 1 we have seen the abstract formulation of probability theory; next we turn to the question of how the probabilities themselves may be assigned.
The most basic scenario occurs when our experiment has a finite number of possible outcomes which we deem to be equally likely.
Such situations rarely—not to say never—occur in practice, but serve as good models in extremely controlled environments such as in gambling or games of chance. However, this situation (which will essentially come down to counting) gives us a good initial setting in which to obtain some very useful insights into the nature and calculus of probability.
2.1 Classical probability
Suppose that we have a finite sample space \(\Omega\). Since \(\Omega\) is finite, we can list it as a collection of \(m = |\Omega|\) possible outcomes: \[\Omega = \{ \omega_1, \ldots, \omega_m \}.\] In the equally likely outcomes model (also sometimes known as classical probability) we suppose that each outcome has the same probability: \[ \mathbb{P}(\omega) = \frac{1}{|\Omega |} \text{ for each } \omega \in \Omega ,\] or, in the notation above, \(\mathbb{P}(\omega_i) = 1/m\) for each \(i\).
The axioms of probability then allow us to determine the probability of any event \(A \subseteq \Omega\): by C7, \[ \mathbb{P}(A) = \sum_{\omega \in A} \mathbb{P}(\omega) = \frac{|A|}{|\Omega|} ~\text{for any event } A \subseteq \Omega .\] This is a particular case of the discrete sample space discussed in Chapter 1.
Using this definition, we meet all of the axioms (A1–A4) (checking each of them comes down to what we know about counting). Remember that in the case of a finite state space, we always have the option to take \(\mathcal{F} = 2^{\Omega}\) as our \(\sigma\)-algebra.
The classical interpretation of probability is the most straightforward approach we can take, just as counting can be seen as “basic” mathematics. It is a good place to start and there are many important situations where intuitively it seems reasonable to say that each outcome of a particular collection is equally likely.
To extend the theory or apply it in practice we have to address situations where there are no candidates for equally likely outcomes or where there are infinitely many possible outcomes and work out how to find probabilities to put into calculations that give useful predictions. We will come back to some of these issues later; but bear in mind that however we come up with our probability model, the same system of axioms that we saw in Chapter 1 applies.
2.2 Counting principles
Given a finite sample space and assuming that outcomes are equally likely, to determine probabilities of certain events comes down to counting.
For example, in drawing a poker hand of five cards from a well-shuffled deck of 52 cards, the probability of having a ‘full house’ (meaning two cards of one denomination and three of another, e.g., two Kings and three 7s) is given by the number of hands that are full houses divided by the total number of hands (each hand being equally likely).
These counting problems need careful thought, and we will describe some counting principles for some of the most common situations. There is some common ground with the Discrete Maths course; here we have a slightly different emphasis.
2.2.1 The multiplication principle
For instance, in a standard deck of playing cards, each card has a denomination and a suit. There are 13 possible denominations: A(ce), 2, 3, …, 10, J(ack), Q(ueen), K(ing). There are 4 possible suits: \(\heartsuit\) (heart), \(\diamondsuit\) (diamond), \(\clubsuit\) (club), \(\spadesuit\) (spade). Because all combinations of denomination and suit are allowed, the multiplication principle applies: there are \(13 \times 4 = 52\) cards in a standard deck.
We will see many applications of counting to dealing cards from a well-shuffled deck. Counting the possibilities is affected by (i) whether the order of dealing is important, and (ii) how we distinguish the cards: e.g. we may only be interested in their colour (so all red cards are the same) or their suit or their denomination.
All of the following counting principles are effectively consequences of the multiplication principle.
2.2.2 Order matters; objects are distinct
First, we look at ordered choices of distinct objects. In this case, we distinguish between selection with replacement, where the same object can be selected multiple times, and selection without replacement, where each object can only be selected at most once.
The falling factorial notation \((m)_r\) (sometimes also denoted \(m^{\underline{r}}\)) is simply a convenient way to write \(\frac{m!}{(m-r)!}\). In the special case where \(r = m\) we set \(0! = 1\) and then \((m)_m = m!\) is the number of permutations of the \(m\) objects. If \(m\) is large, and \(r\) is much smaller than \(m\), then \((m)_r \approx m^r\).
2.2.3 Order doesn’t matter; objects are distinct
In this section, we move on to think about the scenario where the order in which objects are selected doesn’t matter. This can arise in situations such as dealing a hand of cards, or separating a class into two teams.
To see this, first count the number of distinct ordered lists of \(r\) objects—this is \((m)_r\). Each unordered subset has been counted \((r)_r=r!\) times as this is the number of distinct ways of arranging \(r\) different objects. Therefore the \((m)_r\) ordered selections can be grouped into collections of size \(r!\), each representing a particular subset, and the result follows by dividing.
The expression \(\binom{m}{r}\) is the binomial coefficient for choosing \(r\) objects from \(m\) and is often called ‘\(m\)-choose-\(r\)’. Note that \[\binom{m}{r} = \binom{m}{m-r}\] as we can choose to take \(r\) objects from \(m\) in exactly the same number of ways that we can choose to leave behind \(r\) objects i.e., take \(m-r\) objects.
In this simple example, either method is relatively straightforward, but in many examples, it is much more natural to treat the selections as ordered/undordered. For hands of cards, treating them as unordered selections usually works best. For something like rolling dice, it usually makes sense to treat them as ordered selections.
2.2.4 Separating objects into groups
In the final section of this chapter, we look into how we can group objects: either by combining different types of onect into one big group, or by separating a big group into smaller ones.
For example, suppose we have four red tokens, and three black ones. Then there are \(7!/(4!\,3!) = 35\) different ways to lay them out in a row. The probability that they will be alternately red and black is \(1/35\) as there is only one such ordering.
To see why, note that each distinct order for laying out all of the token in a row is precisely the same as choosing 4 of the 7 positions for red ones. In other words, it is an unordered choice of 4 positions from the 7 distinct positions.
This counting principle lets us work out how many different ways there are to divide one group into smaller groups. My favourite example is a packet of Skittles: if there are 16 or 17 of them in a bag, how many different combinations of the five different flavours could we have?
We can count the number of choices with the ‘sheep-and-fences’ method. Placing all the objects in a line, separated into their groups, there are \(k-1\) “fenceposts” between the \(k\) groups of sheep (or Skittles).
For example, with 6 objects in 4 groups, we could represent “three in group A, one in group B, none in group C and two in group D” with the drawing \(***\mid *\mid \mid **\).
We draw \(m+k-1\) ‘things’ in total (stars and fences). This means that the number of groupings of the objects is the same as the number of choices for the locations of the \(k-1\) fences among the \(m+k-1\) ‘things’, or \(\binom{m+k-1}{k-1} = \binom{m+k-1}{m}\).
2.3 Historical context
Classical probability theory originated in calculation of odds for games of chance; as well as contributions by Pierre de Fermat (1601–1665) and Blaise Pascal (1623–1662), comprehensive approaches were given by Abraham de Moivre (1667–1754) (Moivre 1756), Laplace (1749–1827) (Laplace 1825), and Sim'eon Poisson (1781–1840). A collection of these classical methods made just before the advent of the modern axiomatic theory can be found in (Whitworth 1901).