1 Overview

In this lecture we will cover:

An introduction to modelling infectious diseases
How compartmental models are formed
Some practical examples and plots

Much of the material in this section of the course is built on (Bjørnstad 2018) and (Vynnycky and White 2010), which are good references if you’d like to know more (see the References for details).

2 Modelling infectious diseases

In this section:

What is a model?
Progress of a disease
Trajectory of a disease
Transmission

What is a model?

Before we think about modelling infectious diseases, we need to think about models.

We are familiar with many sorts of model:

a scale model of a building,
a fashion model
a mathematical model

All have a similar purpose: to display the key attributes of the thing they are modelling.

A scale model building gives us an idea of the layout, proportions and styling but will be lacking many attributes of the true thing.

For example:

Likely made of different materials from the real building
No wiring or plumbing.

Likewise, a mathematical model of an infectious disease will [attempt to] replicate some of the features of a disease, but not all of them.

Modelling infectious diseases

What do we want the model to tell us?

How quickly the disease spreads
How many people are infected at any one time
How many people are infected in total over the course of the disease

Features of the disease we need to consider:

How the disease progresses in an infected individual
How the disease spreads through the population

We are unlikely to be interested in things like symptoms.

2.1 Stages of an infectious disease

Important stages in an infection

The progress of an infectious disease in a patient. From @arzt2019quantitative

The progress of an infectious disease in a patient. From Arzt et al. (2019)

We distinguish between

whether or not the patient is showing symptoms (bottom - blue/purple arrows)
whether or not they are infectious (middle - green/red lines).

2.1.1 Secondary infections

During the infectious period, the patient can go on to infect one or more secondary cases.

The serial interval, \(T_S\) is the time interval from the onset of a primary case to the onset of a secondary case (one who was infected directly by the primary case).

The serial interval is:

Often estimated by the onset of clinical symptoms, because this is what drives reporting.
Only usually possible to estimate at the outset of an epidemic, when there are few cases and the relationship between successive cases is clear.

We are concerned only with whether or not the person is infectious, not with whether they have symptoms.

2.2 Trajectory

Most epidemiological models think of each person as belonging to a particular state (or compartment, as we will soon call it).

For example, a person can be susceptible, infectious, recovered and so on.

When modelling any disease, it is important to consider

What are the states a person can be in?
Which of these states are important for understanding how the disease spreads?
What are the possible pathways through those states?

The main states we will encounter in our models are:

Susceptible (S) - a person who is not infected, but could become infected
Exposed (E) - (or ‘Latent’, or ‘pre-infectious’). A person who has been infected, but is not yet infectious.
Infectious (I) - someone who can transmit the disease to others
Removed (R) - someone who is not infected/infectious and cannot be infected (is immune). Sometimes this implicitly includes people who have died of the disease.

A person must be in one (and only one) state at a time

2.2.1 Model diagrams

We use these states to label models.

In an SIR model a person starts as susceptible, can become infectious and subsequently recover with permanent immunity.

In an SEIRS model there is a latent period following infection (during which a person is in the ‘E’ state), and after recovering people can be reinfected

Important questions

When thinking about the structure of a model, it is important to ask questions like:

How long is the latent period (is it worth including)?
Having been infected once, can a person become infected again?
If so, is there a period for which they can’t (temporary immunity)?

2.3 Transmission

An important feature of any disease model is some measure of how easily the disease is transmitted from one person to another.

There are several measures of transmissibility:

Secondary attack rate
Basic reproduction number
Effective reproduction number

2.3.1 Secondary attack rate

This is perhaps the most accurate, but not the most widely used.

The secondary attack rate is the proportion of susceptibles in contact with a primary case (for example those in the same house or office) who are then infected.

2.3.1.1 Example

A nursery child is infectious with chickenpox.
There are 10 other children in her nursery group.
Four of them have already had chickenpox and are therefore immune
Therefore there are six susceptible children in contact with the infectious child.

Suppose two of these become infected, then we estimate the secondary attack rate to be \[\frac{2}{6} = \frac{1}{3}.\]

2.3.2 Reproduction number

The reproduction number, \(R\) is defined as the (average) number of successful direct transmissions per infectious person.

This is the number we came to know as “the R number” in Covid times.

There are two widely used versions of the reproduction number.

The basic reproduction number (\(R_0\))
The effective/net reproduction number (\(R_n\))

The basic reproduction number (\(R_0\))

\(R_0\) is the basic reproduction number.

This is the expected number of secondary cases arising from an infectious person in an otherwise susceptible population.

The basic reproduction number illustrated, with \(R_0 = 2\). From The Conversation.

The effective reproduction number (\(R_n\))

However: The proportion of susceptibles decreases throughout the epidemic (and may not be everyone to start with), so the number of secondary infections from each infectious person will be less than \(R_0\).

\(R_n\) is the effective reproduction number (or sometimes net reproduction number).

Sometimes denoted \(R_e\) (for ‘effective’) or \(R_t\) (to denote the dependence on time).

\(R_n\) is calculated as \[\begin{equation} R_n = R_0 \times{s}, \end{equation}\] where \(s\) is the proportion of the population that is susceptible.

\(R_n\) changes throughout the course of an epidemic
\(R_0\) only changes when there are fundamental changes to the transmission dynamics

The effective reproduction number (\(R_n\))

The effective reproduction number illustrated if \(R_0=2\) and \(s=0.5\), therefore \(R_n = 1\). Adapted from The Conversation.”

Reproduction number = 1

An outbreak will only occur if the number of infectious people grows from the small group that started it.

The reproduction number needs to be greater than one for the epidemic to continue growing

If \(R_0<1\) there will be no epidemic to begin with.
When \(R_n\) falls below one, the epidemic will eventually stop spreading.

2.3.3 Herd immunity threshold (HIT)

Idea: Use the \(R_n\) formula to find the proportion of the population that needs to be immune (ie. not susceptible) to achieve \(R_n<1\).

We have \(R_n = R_0\times{s}.\) So, for \(R_n=1\) we need \[s = \frac{1}{R_0}\] Then, the proportion who are immune, and therefore the herd-immunity threshold (HIT) is \[HIT = 1-s = 1-\frac{1}{R_0} = \frac{R_0-1}{R_0}.\]

The Herd Immunity Threshold can be used

as a (crude) target for vaccination programmes
to estimate the proportion of the population who will have succumbed to the disease before the epidemic dies out (if it is perfectly immunising).

For example (from Vynnycky and White 2010)

for influenza \(R_0\approx{2}\)

for Measles \(R_0\approx{13}\).

This means that when \(s<0.5\) for Influenza, or \(s<0.08\) for Measles, \(R_n\approx{1}\) and the infection should start to die out.

2.4 General modelling issues

2.4.1 Modelling vocabulary quiz

Words	Definitions
1. Rate	A. A description of a system using mathematical rules.
2. Risk	B. The whole number of individuals in a mathematical model.
3. Population	C. The measure of frequency of some event for a given number of individuals.
4. Model	D. The chance/probability of a particular event happening.

2.5 Question: which features are necessary?

Modern computers allow for complex models. But it’s harder to learn from more complex models.

Distinguish between age groups?
Extent of suffering with disease (mild vs. hospitalised vs. ICU)
Time period to model (a few weeks or several years?)
Effect of new strains and variants
Seasonal changes in contact or transmission

We will look into some of these this week.

3 The SIR model

In this section:

Compartmental models
The SIR model
Difference equations
An example

3.1 Compartmental models

Compartmental models are a widely used class of computer model (not just for epidemiology)

Modelling a population of something (people, animals, a chemical)
The population is divided into compartments representing possible states
Equations describe movement from one compartment to another
The model updates the number in each compartment at each time step using the equations.

In epidemiology we often name a model by the trajectory of states, eg. ‘SIR model’, ‘SEIRS model’.

We will look at the SIR model to understand how this works.

3.2 The SIR model

The SIR model was developed in the early 20th Century to model how a disease would spread through a population of susceptible people.

The three compartments are:

S - Susceptible - People who are susceptible to the disease
I - Infectious - People who are infected with the disease (and are able to pass it on)
R - Removed - People who have recovered from the disease (or have died).

The total number of people remains constant at \[N= S + I + R,\] and therefore the model is said to be closed.

The model starts with everyone Susceptible apart from a small number (often one) who are Infectious.

3.3 Difference equations

Key Question: Given the number of people in each compartment today, can we calculate how many will be in each compartment tomorrow?

Difference equations relate the number of people in a compartment at time \(t\) to the number in each compartment at time \(t-1\) (the previous time).

Notation

\(CAPITAL\; LETTERS\) for compartments
\(\cdot_{(subscripts)}\) for time

So for example

\(S_{\left(t-1\right)}\) is the number of Susceptibles at time \(t-1\)
\(R_t\) is the number of Removed at time \(t\)

Possible transfers between compartments

We have: \[\begin{align*} S_{t} & = S_{\left(t-1\right)} - \left(\text{# newly infected at time }\left(t-1\right)\right)\\ I_{t} & = I_{\left(t-1\right)} + \left(\text{# newly infected at time }\left(t-1\right)\right) - \left(\text{# newly removed at time }\left(t-1\right)\right)\\ R_{t} & = R_{\left(t-1\right)} + \left(\text{# newly removed at time }\left(t-1\right)\right) \end{align*}\]

The population of each compartment at time \(t\) is equal to the population at time \(\left(t-1\right)\)…

…plus those who have newly joined at time \(\left(t-1\right)\)…

…minus those who left at time \(\left(t-1\right)\).

In a closed epidemic (ie. one with no births, deaths or migrations) we should have \[ S_{\left(t-1\right)} + I_{\left(t-1\right)} + R_{\left(t-1\right)} = S_t + I_t + R_t = N\] for all \(t\), and it is clear here that this is the case.

Parameters for difference equations

We have: \[\begin{align*} S_{t} & = S_{\left(t-1\right)} - \left(\text{# newly infected at time }\left(t-1\right)\right)\\ I_{t} & = I_{\left(t-1\right)} + \left(\text{# newly infected at time }\left(t-1\right)\right) - \left(\text{# newly removed at time }\left(t-1\right)\right)\\ R_{t} & = R_{\left(t-1\right)} + \left(\text{# newly removed at time }\left(t-1\right)\right) \end{align*}\]

In this model there are therefore two quantities to think about:

the number newly infected at each time step
the number newly removed at each time step.

In this course we will follow the conventions of Kermack and McKendrick (1927), which are still commonly in use.

3.3.1 Newly infected

The number of people newly infected at each time step will depend on

How many infectious people are in the population \((I)\);
How many susceptible people are in the population \((S)\);
The level of contact between people;
How transmissible the disease is.

Model parameter \(\beta\) (Beta)

\(\beta\) is the per capita transmission rate. It is often broken down as

\[ \beta = \text{contact rate} \times \text{transmission probability}. \] If an infectious person has contact with 6 [susceptible] people per day, and the probability of transmitting the disease [to a susceptible person] is 0.4, they will infect 2.4 people per day.

We could think of \(\beta\) in this case as \[ \beta = 6 \times{0.4} = 2.4.\]

Note: We’ve written ‘day’, but really what we mean is ‘time unit’

Effective contact

Contact here is used to mean effective contact - a level of contact which would be sufficient to lead to infection.

The actual level of contact this means depends on the mode of transmission (for example respiratory / vector-borne / faecal-oral / sexual etc.).

\(\beta\) is likely to be much higher for diseases that spread through casual contact (for example measles)
\(\beta\) will be higher in settings where there is a high level of close contact (for example schools and nurseries)

The number of newly infected people at each time step

We can use \(\beta\) to model this quantity

Those people moving from \(S\) (susceptible) to \(I\) (infectious) can be broken down as \[I \times{\beta} \times{\frac{S}{N}}= \frac{\beta I S}{N}. \]

Each infectious person potentially infects \(\beta\) people per day

They can only infect the susceptible proportion \(\frac{S}{N}\) of those they have effective contact with

The Mass Action principle

An important point here is the mass action principle.

In a compartmental model, we assume

Everyone is mixing randomly
Any chosen pair of people is equally likely to be in effective contact

There is no real concept of ‘individuals’, or of some people being more proximal to one another.

This is an assumption we will return to later, when we look at Agent Based Models.

3.3.2 Newly recovered

The number of people recovering at each time step depends on

How many people are infectious \((I)\)
The rate at which infectious people recover

Model parameter \(\gamma\) (‘Gamma’), the recovery rate

This is the proportion of infectious people who will recover at each time step.

The number of newly recovered people at each time step

At each time-step, \(\gamma \times{I}\) people will move move from \(I\) (infectious) to \(R\) (removed)

The period of time for which a person is infectious is therefore approximately \(\frac{1}{\gamma}\).

3.3.3 Difference equations specified

Our difference equations therefore become

\[\begin{align*} S_{t} & = S_{\left(t-1\right)} - \frac{\beta I_{\left(t-1\right)} S_{\left(t-1\right)}}{N}\\ I_{t} & = I_{\left(t-1\right)} + \frac{\beta I_{\left(t-1\right)}S_{\left(t-1\right)}}{N} - \gamma I_{\left(t-1\right)}\\ R_{t} & = R_{\left(t-1\right)} + \gamma I_{\left(t-1\right)} \end{align*}\]

Given

values for parameters \(\beta\) and \(\gamma\)
starting populations for each compartment,

we can compute the trajectory of an epidemic (according to this model).

Exercise

Talk about this with the people around you

Think of the interventions used during the Covid-19 pandemic.
Which model parameters would they affect, and how?

Reminder:

\(\beta = \text{contact rate }\times\text{ transmission probability}\) is the per-capita transmission rate
\(\gamma\) is the recovery rate (\(\frac{1}{\gamma}\) is the expected duration of inectiousness)

3.3.4 A note on risks and rates

When we think about events happening, we are really thinking about risk.

There is uncertainty about if/when people will get infected or recover.

However, in compartmental models we work in terms of rates.

We assume that individuals become infectious at a constant rate, and that once infected they recover at a constant rate.

Working with rates

Convenient:

\[ \scriptstyle \text{Rate of occurence} = \frac{1}{\text{Average time to event}}. \]

Unrealistic:

Implies that the event duration is exponentially distributed:

Memoryless property
Unrealistic shape

Distribution of the time between becoming infectious and recovery, assuming that infectious individuals recover at a rate of 0.3 per day.

3.3.5 Reproduction numbers revisited

From our SIR model we can calculate a theoretical value for the basic reproduction number \(R_0\).

An epidemic grows if the number of infected individuals is increasing, that is if \[ \frac{\beta I_{\left(t-1\right)}S_{\left(t-1\right)}}{N} - \gamma I_{\left(t-1\right)} = \frac{\beta S_{\left(t-1\right)}}{N} - \gamma > 0\] assuming \(I_{\left(t-1\right)}>0\).

We can rearrange this as \[ \frac{\beta S_{\left(t-1\right)}}{N} > \gamma \] or \[ \frac{\beta S_{\left(t-1\right)}}{\gamma N} > 1. \]

Reproduction numbers revisited

At the start, we have \(S_{\left(t-1\right)}\approx{N}\) because almost everyone is still susceptible, so \[\frac{S_{\left(t-1\right)}}{N}\approx{1}\] and therefore an epidemic occurs if \[\frac{\beta}{\gamma}>1 \] and we have (for this model set-up) \(R_0=\frac{\beta}{\gamma}\).

Example: SIR

Suppose we have a population of \(N=1000\) people, of which one is infectious with some disease, and none have immunity. This means we have \[ \begin{align*} S_1 & = 999 \\ I_1 &= 1 \\ R_1 &= 0 \end{align*}\] Assume that each person is in effective contact with 5 people per day, and that the probability of an infectious person passing the disease on to a susceptible person when in effective contact is 0.25.

With this disease, an infectious person is infectious for 2 days (on average). This means that \[ \begin{align*} \beta &= 5 \times{0.25} = 1.25 \\ \gamma &= \frac{1}{2} \end{align*}\]

Example continued

To calculate the compartment populations at \(t=2\) therefore, we have

\[ \begin{align*} S_{2} &= S_1 -\frac{\beta I_1 S_1}{N} & =& 999 -\frac{1.25 \times{1}\times{999}}{1000} & = 997.8 \text{ (3 s.f.)}\\ I_2 &=I_1 + \frac{\beta I_1 S_1}{N} - \gamma I_1 & = &1+ \frac{1.25 \times{1}\times{999}}{1000} - \frac{1}{2}\times{1} &= 1.75\text{ (3 s.f.)} \\ R_2 &=R_1+ \gamma I_1 & =&0+ \frac{1}{2}\times{1} & = 0.5 \text{ (3 s.f.)} \end{align*} \]

Exercise

Calculate the populations of each compartment for the following day, \(t=3\)
Remember
- \(\beta = 1.25,\,\gamma=0.5\).
- \(S_2 = 997.8,\,I_2 = 1.75,\,R_2 = 0.5\)
Calculate \(R_0\) for this model.

Programming SIR models

We can program these equations in R (or another language) and plot the progress of an epidemic

The example above continued for 60 days. Points show the calculated values, the lines interpolate these points to make the paths clearer.

3.3.6 Final epidemic size

The total number of people infected over the course of the epidemic is known as the final epidemic size.

A closed system (where the total number of people, \(N\), stays the same) will tend towards \(\lbrace S^*, I^*, R^*\rbrace\), where

\(I^*=0\)
the epidemic has extinguished itself.

The final epidemic size is \(R^*\), the total number of people who contracted the disease, and \(S^*\) is the number who escaped it.

In the example, we can see that \(S^*=70, R^*=930\).

3.3.7 Example: Reproduction numbers

Having calculated \(R_0 = 2.5\), we can now calculate \(R_n\) using the proportion of susceptibles at each time point: \[R_n\left(t\right) = \frac{S_t}{N}R_0\]

The values of \(R_n\) and \(R_0\) for the model described above.

Notice that the point at which \(R_n\approx{1}\) corresponds to the point at which the number of infectious people begins to decrease, around day 13.

Instability!

Unfortunately, this discrete time-step approach is sensitive and can be unstable, especially for

large \(\beta\)
small \(\gamma\)
long time-steps

Changing time-step in our example

We will re-run our example using 6-hourly and 3-daily time steps.

This means adjusting our parameters to have the same overall meaning:

\[\begin{align*} \beta_{6\text{hourly}} &= 0.25\times{\beta_{\text{daily}}}\\ \gamma_{6\text{hourly}} &=0.25\times\gamma_{\text{daily}} \end{align*}\]

and similarly \[\begin{align*} \beta_{3\text{daily}} &= 3\times{\beta_{\text{daily}}}\\ \gamma_{3\text{daily}} &=3\times\gamma_{\text{daily}} \end{align*}\]

Example: changing time-step

Our SIR model with three different timesteps.

The shorter the timestep, the smoother the curves and the less erratic the behaviour.
In particular, with a 3 day timestep we have some negative \(I\) values and some \(R\) values greater than \(N\).

4 From difference equations to differential equations

Because the disease is progressing continuously (rather than in discrete steps) our model needs to work continuously.

In this section

Making our SIR model continuous
Solving the SIR model in R

4.1 Thinking continuously

For a continuous model, it is sometimes impossible to write down a formula for the populations of the compartments at a particular time point

We therefore think in terms of the rate of change in each compartment over time.

Think first of the change in compartment populations from time \(t\) to time \(t+1\)

For example, \[ \frac{I_{\left(t+1\right)} - I_{t}}{\left(t+1\right) - t} = I_{\left(t+1\right)} - I_{t}, \] This is basically a gradient - how the population of \(I\) changes from one time step to the next.

Thinking continuously

We can do this for some amount of time \(\delta t\) \[ \frac{I_{\left(t+\delta t\right)} - I_{t}}{\left(t+\delta t\right) - t}=\frac{I_{\left(t+\delta t\right)} - I_{t}}{\delta t}, \] where \(\delta t\) represents some very small (but non-zero) time interval.

This doesn’t quite make sense, since we are no longer thinking in discrete time steps, so we change to continuous function notation:

\[ \frac{I\left(t+\delta t\right) - I\left(t\right)}{\left(t+\delta t\right) - t} = \frac{I\left(t+\delta t\right) - I\left(t\right)}{\delta t}. \]

Shrinking the time-step

The smaller the time-step \(\delta t\), the more accurate our gradient \[\frac{I\left(t+\delta t\right) - I\left(t\right)}{\delta t}\] becomes.

Thinking continuously

We would like the gradient to change continuously as the function \(I\left(t\right)\) changes, so we think about the limit as \(\delta t\) “tends to zero”: \[ \frac{dI\left(t\right)}{dt} = \lim\limits_{\delta t \rightarrow{0}}{\frac{I\left(t+\delta t\right) - I\left(t\right)}{\delta t}}. \] We call this ‘the derivative of \(I\left(t\right)\) with respect to \(t\)’, and often write it \[\frac{dI}{dt}.\]

4.1.1 Continuous SIR model

We can now specify our SIR model as a system of ordinary differential equations

These equate the rates of change of the compartments to the populations of the compartments as follows:

\[ \begin{align*} \frac{dS}{dt} &= -\frac{\beta I S}{N}\\ \frac{dI}{dt} &= \frac{\beta I S}{N} - \gamma I\\ \frac{dR}{dt} &= \gamma I \end{align*}. \]

Notice that we have dropped the \(\left(t\right)\) notation, but \(S,\,I\) and \(R\) are each still functions of \(t\).

4.2 Solving the ODEs in R

The differential equations need to be solved for us to calculate \(S\left(t\right),\,I\left(t\right)\) and \(R\left(t\right)\).

To set up an SIR model in R, there are five steps:

Define an R function containing the general set of equations, those that govern the flow of people from one compartment to another.
Specify the time points for which we want R to output a solution.
Supply values for the parameters (for example \(\beta\) and \(\gamma\)).
Supply initial values for the populations of each compartment.
Call the function that will put all these together.

There are several ways to solve systems of ordinary differential equations in R, but we will mainly use the package deSolve.

4.2.1 Defining the set of equations

# Lines like this one, with a hash at the start, are comments

library(deSolve)
sir_model = function (t, y, parms)
{
  # label state variables from within the y vector of initial values
  S = y[1]        # susceptibles
  I = y[2]        # infectious
  R = y[3]        # recovered
  N = S + I + R
  
  # Pull parameter values from the parms vector
  beta = parms["beta"]
  gamma = parms["gamma"]

  # Define equations (dS etc. are simplified notation for dS/dt)
  dS = -(beta*S*I)/N
  dI = (beta*S*I)/N - gamma*I
  dR = gamma * I

  # return list of gradients   
  res = c(dS, dI, dR)
  list(res)
}

4.2.2 Specifying time points

We must also specify the time points for which we would like the model to return output, for example:

timepoints = seq (0, 50, by=0.5)
timepoints

##   [1]  0.0  0.5  1.0  1.5  2.0  2.5  3.0  3.5  4.0  4.5  5.0  5.5  6.0  6.5  7.0
##  [16]  7.5  8.0  8.5  9.0  9.5 10.0 10.5 11.0 11.5 12.0 12.5 13.0 13.5 14.0 14.5
##  [31] 15.0 15.5 16.0 16.5 17.0 17.5 18.0 18.5 19.0 19.5 20.0 20.5 21.0 21.5 22.0
##  [46] 22.5 23.0 23.5 24.0 24.5 25.0 25.5 26.0 26.5 27.0 27.5 28.0 28.5 29.0 29.5
##  [61] 30.0 30.5 31.0 31.5 32.0 32.5 33.0 33.5 34.0 34.5 35.0 35.5 36.0 36.5 37.0
##  [76] 37.5 38.0 38.5 39.0 39.5 40.0 40.5 41.0 41.5 42.0 42.5 43.0 43.5 44.0 44.5
##  [91] 45.0 45.5 46.0 46.5 47.0 47.5 48.0 48.5 49.0 49.5 50.0

Whatever time unit we think in terms of while defining our parameters, that is our time unit.

Note: Unlike in the discrete case, this has nothing to do with the resolution at which R calculates the output.

4.2.3 Parameter Values

We must supply values for parms

For example, to recreate our example we’d put

params_vec = c(
  beta = 1.25, 
  gamma = 1/2
  )
params_vec

##  beta gamma 
##  1.25  0.50

Because we supply a named vector (with the names matching the parameters listed in sir_model) the function knows how to use each value.

4.2.4 Initial values

We supply starting values for each compartment, for example:

initial_values = c (S = 999, I = 1, R = 0)

This is to be used as the y argument to the function sir_model.

Again, the elements of the vector are named with the same names as those used in sir_model, so that their intended use is clear.

4.2.5 Putting it all together

Finally, we supply all this information to the function ode:

output1 = ode (y=initial_values, times = timepoints, func = sir_model, parms= params_vec)
output1 = as.data.frame(output1)
head(output1)

##   time        S        I         R
## 1  0.0 999.0000 1.000000 0.0000000
## 2  0.5 998.2430 1.453761 0.3032041
## 3  1.0 997.1439 2.112208 0.7438629
## 4  1.5 995.5498 3.066331 1.3838414
## 5  2.0 993.2416 4.446083 2.3123499
## 6  2.5 989.9071 6.435446 3.6574847

The object output1 is then a matrix (which we have now turned into a data.frame) with four columns: time, S, I and R, showing the populations in each compartment at each time step in timepoints.

Reproduction numbers

Using the parameters we have specified we can also calculate \(R_0\), and then \(R_n\) at each time point:

# calculate R0:
R0 = params_vec['beta']/params_vec['gamma']

# calculate total number of people from initial values
pop_total = sum(initial_values)

# Calculate Rn and add as a column of output1
output1$Rn = R0*(1/pop_total)*output1$S

head(output1)

##   time        S        I         R       Rn
## 1  0.0 999.0000 1.000000 0.0000000 2.497500
## 2  0.5 998.2430 1.453761 0.3032041 2.495608
## 3  1.0 997.1439 2.112208 0.7438629 2.492860
## 4  1.5 995.5498 3.066331 1.3838414 2.488875
## 5  2.0 993.2416 4.446083 2.3123499 2.483104
## 6  2.5 989.9071 6.435446 3.6574847 2.474768

Changing to long format

It is often simpler to work with the data in ‘long’ format (more rows, fewer columns), so we write a plotting function which pivots the data frame to be ‘long’ before plotting the data using the package ggplot2 (Wickham 2016):

plot_sir = function(
  sir_df, 
  Compartments = c("S", "I", "R") # default
){
# Collect the values in columns S, I and R (or those given as "Compartments")
#  into one column named 'value', and create an additional column named 
# "Compartment".
    output_tidy = pivot_longer(sir_df, cols = all_of(Compartments), names_to = "Compartment")
  # Plot the values, with a different coloured line for each compartment
    ggplot(data=output_tidy, aes(x=time, y=value, col=Compartment)) + 
      geom_line() + ylab("People")  
  }

There is a new ‘Compartment’ column, and the populations are in the ‘value’ column.

This is much simpler to plot using the package ggplot2 (Wickham 2016):

The continuous model

plot_sir(output1, Compartments= c("S", "I", "R"))

4.3 Assumptions

You may have been thinking that this model wouldn’t accurately represent many real-life infectious diseases, and you would be correct.

Some of the assumptions / simplifications that we will go on to investigate are:

This model is entirely deterministic - given the same parameter values and initial compartment populations, exactly the same thing will always happen;
Not all infections are fully immunising - for many infectious diseases, you can become infected more than once;
Sometimes we are interested in different levels of infection;
The real world population isn’t closed;
People don’t affect the rest of the population equally (mass action).

5 Summary

We’ve covered a lot of ground in these first two lectures.

The main points to take away are:

We can measure transmission of an infectious disease in terms of the secondary attack rate, the basic reproduction number and the effective reproduction number
In Compartmental models for infectious diseases, the population is divided into a number of compartments, representing different disease states people can be in. In the SIR model we studied, these are Susceptible, Infectious and Removed.
The model uses a system of ordinary differential equations to specify the rate at which the population of each compartment changes with time.
These differential equations use model parameters (\(\beta\) and \(\gamma\) in our SIR model), which reflect the behaviour of a particular population and infectious disease.
We can work with these models in R.

6 Some further reading

There are many, many articles about SIR models (especially since Covid-19!) so here are some that may be of interest:

Lazzari et al. (2020) uses an SIR model and historical data from Venice to learn about the Plague in 1630-31.
Brauer (2017) gives a review of mathematical epidemiology.

7 References

Arzt, Jonathan, Matthew A Branan, Amy H Delgado, Shankar Yadav, Karla I Moreno-Torres, Michael J Tildesley, and Carolina Stenfeldt. 2019. “Quantitative Impacts of Incubation Phase Transmission of Foot-and-Mouth Disease Virus.” Scientific Reports 9 (1): 1–13.

Bjørnstad, O. N. 2018. Epidemics: Models and Data Using r. Use r! Springer International Publishing. https://books.google.co.uk/books?id=4sJ1DwAAQBAJ.

Brauer, Fred. 2017. “Mathematical Epidemiology: Past, Present, and Future.” Infectious Disease Modelling 2 (2): 113–27.

Kermack, William Ogilvy, and Anderson G McKendrick. 1927. “A Contribution to the Mathematical Theory of Epidemics.” Proceedings of the Royal Society of London. Series A, Containing Papers of a Mathematical and Physical Character 115 (772): 700–721.

Lazzari, Gianrocco, Giovanni Colavizza, Fabio Bortoluzzi, Davide Drago, Andrea Erboso, Francesca Zugno, Frédéric Kaplan, and Marcel Salathé. 2020. “A Digital Reconstruction of the 1630–1631 Large Plague Outbreak in Venice.” Scientific Reports 10 (1): 1–7.

Vynnycky, E., and R. White. 2010. An Introduction to Infectious Disease Modelling. OUP Oxford. https://books.google.co.uk/books?id=8iVz3NHLnYUC.

Wickham, Hadley. 2016. Ggplot2: Elegant Graphics for Data Analysis. Springer-Verlag New York. https://ggplot2.tidyverse.org.

Models and Methods in Health Data Science

Lectures 1 & 2: Introduction to compartmental infectious disease models

1 Overview

2 Modelling infectious diseases

What is a model?

Modelling infectious diseases

2.1 Stages of an infectious disease

2.1.1 Secondary infections

2.2 Trajectory

2.2.1 Model diagrams

Important questions

2.3 Transmission

2.3.1 Secondary attack rate

2.3.1.1 Example

2.3.2 Reproduction number

The basic reproduction number (\(R_0\))

The effective reproduction number (\(R_n\))

The effective reproduction number (\(R_n\))

Reproduction number = 1

2.3.3 Herd immunity threshold (HIT)

2.4 General modelling issues

2.4.1 Modelling vocabulary quiz

2.5 Question: which features are necessary?

3 The SIR model

3.1 Compartmental models

3.2 The SIR model

3.3 Difference equations

Possible transfers between compartments

Parameters for difference equations

3.3.1 Newly infected

Model parameter \(\beta\) (Beta)

Effective contact

The number of newly infected people at each time step

The Mass Action principle

3.3.2 Newly recovered

Model parameter \(\gamma\) (‘Gamma’), the recovery rate

3.3.3 Difference equations specified

Exercise

3.3.4 A note on risks and rates

Working with rates

3.3.5 Reproduction numbers revisited

Reproduction numbers revisited

Example: SIR

Example continued

Exercise

Programming SIR models

3.3.6 Final epidemic size

3.3.7 Example: Reproduction numbers

Instability!

Changing time-step in our example

Example: changing time-step

4 From difference equations to differential equations

4.1 Thinking continuously

Thinking continuously

Shrinking the time-step

4.1.1 Continuous SIR model

4.2 Solving the ODEs in R

4.2.1 Defining the set of equations

4.2.2 Specifying time points

4.2.3 Parameter Values

4.2.4 Initial values

4.2.5 Putting it all together

Reproduction numbers

Changing to long format

The continuous model

4.3 Assumptions

5 Summary

6 Some further reading

7 References