Rachel Oughton
13/02/2023
In this lecture we will cover:
Much of the material in this section of the course is built on (Bjørnstad 2018) and (Vynnycky and White 2010), which are good references if you’d like to know more (see the References for details).
In this section:
Before we think about modelling infectious diseases, we need to think about models.
We are familiar with many sorts of model:
All have a similar purpose: to display the key attributes of the thing they are modelling.
A scale model building gives us an idea of the layout, proportions and styling but will be lacking many attributes of the true thing.
For example:
Likewise, a mathematical model of an infectious disease will [attempt to] replicate some of the features of a disease, but not all of them.
What do we want the model to tell us?
Features of the disease we need to consider:
We are unlikely to be interested in things like symptoms.
Important stages in an infection
The progress of an infectious disease in a patient. From Arzt et al. (2019)
We distinguish between
During the infectious period, the patient can go on to infect one or more secondary cases.
The serial interval, \(T_S\) is the time interval from the onset of a primary case to the onset of a secondary case (one who was infected directly by the primary case).
The serial interval is:
We are concerned only with whether or not the person is infectious, not with whether they have symptoms.
Most epidemiological models think of each person as belonging to a particular state (or compartment, as we will soon call it).
For example, a person can be susceptible,
infectious, recovered and so on.
When modelling any disease, it is important to consider
The main states we will encounter in our models are:
A person must be in one (and only one) state at a time
We use these states to label models.
In an SIR
model a person starts as susceptible, can
become infectious and subsequently
recover with permanent immunity.
In an SEIRS model there is a latent
period following infection (during which a person is in the ‘E’
state), and after recovering people can be reinfected
When thinking about the structure of a model, it is important to ask questions like:
An important feature of any disease model is some measure of how easily the disease is transmitted from one person to another.
There are several measures of transmissibility:
This is perhaps the most accurate, but not the most widely used.
The secondary attack rate is the proportion of susceptibles in contact with a primary case (for example those in the same house or office) who are then infected.
Suppose two of these become infected, then we estimate the secondary attack rate to be \[\frac{2}{6} = \frac{1}{3}.\]
The reproduction number, \(R\) is defined as the (average) number of successful direct transmissions per infectious person.
This is the number we came to know as “the R number” in Covid times.
There are two widely used versions of the reproduction number.
\(R_0\) is the basic reproduction number.
This is the expected number of secondary cases arising from an infectious person in an otherwise susceptible population.
The basic reproduction number illustrated, with \(R_0 = 2\). From The
Conversation.
However: The proportion of susceptibles decreases throughout the epidemic (and may not be everyone to start with), so the number of secondary infections from each infectious person will be less than \(R_0\).
\(R_n\) is the effective reproduction number (or sometimes net reproduction number).
- Sometimes denoted \(R_e\) (for ‘effective’) or \(R_t\) (to denote the dependence on time).
\(R_n\) is calculated as \[\begin{equation} R_n = R_0 \times{s}, \end{equation}\] where \(s\) is the proportion of the population that is susceptible.
The effective reproduction number illustrated if \(R_0=2\) and \(s=0.5\), therefore \(R_n = 1\). Adapted from The
Conversation.”
An outbreak will only occur if the number of infectious people grows from the small group that started it.
The reproduction number needs to be greater than one for the epidemic to continue growing
Idea: Use the \(R_n\) formula to find the proportion of
the population that needs to be immune (ie. not susceptible) to
achieve \(R_n<1\).
We have \(R_n = R_0\times{s}.\) So, for \(R_n=1\) we need \[s = \frac{1}{R_0}\] Then, the proportion who are immune, and therefore the herd-immunity threshold (HIT) is \[HIT = 1-s = 1-\frac{1}{R_0} = \frac{R_0-1}{R_0}.\]
The Herd Immunity Threshold can be used
For example (from Vynnycky and White 2010)
- for influenza \(R_0\approx{2}\)
- for Measles \(R_0\approx{13}\).
This means that when \(s<0.5\) for Influenza, or \(s<0.08\) for Measles, \(R_n\approx{1}\) and the infection should start to die out.
Words | Definitions |
---|---|
1. Rate | A. A description of a system using mathematical rules. |
2. Risk | B. The whole number of individuals in a mathematical model. |
3. Population | C. The measure of frequency of some event for a given number of individuals. |
4. Model | D. The chance/probability of a particular event happening. |
Modern computers allow for complex models. But it’s harder to learn from more complex models.
We will look into some of these this week.
In this section:
Compartmental models are a widely used class of computer model (not just for epidemiology)
In epidemiology we often name a model by the trajectory of states,
eg. ‘SIR model’, ‘SEIRS model’.
We will look at the SIR model to understand how this works.
The SIR model was developed in the early 20th Century to model how a disease would spread through a population of susceptible people.
The three compartments are:
The total number of people remains constant at \[N= S + I + R,\] and therefore the model is
said to be closed.
The model starts with everyone Susceptible apart from a small number (often one) who are Infectious.
Key Question: Given the number of people in each compartment today, can we calculate how many will be in each compartment tomorrow?
Difference equations relate the number of people in
a compartment at time \(t\) to the
number in each compartment at time \(t-1\) (the previous time).
Notation
So for example
We have: \[\begin{align*} S_{t} & = S_{\left(t-1\right)} - \left(\text{# newly infected at time }\left(t-1\right)\right)\\ I_{t} & = I_{\left(t-1\right)} + \left(\text{# newly infected at time }\left(t-1\right)\right) - \left(\text{# newly removed at time }\left(t-1\right)\right)\\ R_{t} & = R_{\left(t-1\right)} + \left(\text{# newly removed at time }\left(t-1\right)\right) \end{align*}\]
The population of each compartment at time \(t\) is equal to the population at time \(\left(t-1\right)\)…
- …plus those who have newly joined at time \(\left(t-1\right)\)…
- …minus those who left at time \(\left(t-1\right)\).
In a closed epidemic (ie. one with no births, deaths or migrations) we should have \[ S_{\left(t-1\right)} + I_{\left(t-1\right)} + R_{\left(t-1\right)} = S_t + I_t + R_t = N\] for all \(t\), and it is clear here that this is the case.
We have: \[\begin{align*} S_{t} & = S_{\left(t-1\right)} - \left(\text{# newly infected at time }\left(t-1\right)\right)\\ I_{t} & = I_{\left(t-1\right)} + \left(\text{# newly infected at time }\left(t-1\right)\right) - \left(\text{# newly removed at time }\left(t-1\right)\right)\\ R_{t} & = R_{\left(t-1\right)} + \left(\text{# newly removed at time }\left(t-1\right)\right) \end{align*}\]
In this model there are therefore two quantities to think about:
In this course we will follow the conventions of Kermack and McKendrick (1927), which are still commonly in use.
The number of people newly infected at each time step will depend on
\(\beta\) is the per capita transmission rate. It is often broken down as
\[ \beta = \text{contact rate} \times \text{transmission probability}. \] If an infectious person has contact with 6 [susceptible] people per day, and the probability of transmitting the disease [to a susceptible person] is 0.4, they will infect 2.4 people per day.
We could think of \(\beta\) in this case as \[ \beta = 6 \times{0.4} = 2.4.\]
Note: We’ve written ‘day’, but really what we mean is ‘time unit’
Contact here is used to mean effective contact - a level of contact which would be sufficient to lead to infection.
The actual level of contact this means depends on the mode of transmission (for example respiratory / vector-borne / faecal-oral / sexual etc.).
We can use \(\beta\) to model this quantity
Those people moving from \(S\) (susceptible) to \(I\) (infectious) can be broken down as \[I \times{\beta} \times{\frac{S}{N}}= \frac{\beta I S}{N}. \]
- Each infectious person potentially infects \(\beta\) people per day
- They can only infect the susceptible proportion \(\frac{S}{N}\) of those they have effective contact with
An important point here is the mass action principle.
In a compartmental model, we assume
There is no real concept of ‘individuals’, or of some people being more proximal to one another.
This is an assumption we will return to later, when we look at Agent Based Models.
The number of people recovering at each time step depends on
This is the proportion of infectious people who will recover at each time step.
The number of newly recovered people at each time step
At each time-step, \(\gamma \times{I}\) people will move move from \(I\) (infectious) to \(R\) (removed)
The period of time for which a person is infectious is therefore approximately \(\frac{1}{\gamma}\).
Our difference equations therefore become
\[\begin{align*} S_{t} & = S_{\left(t-1\right)} - \frac{\beta I_{\left(t-1\right)} S_{\left(t-1\right)}}{N}\\ I_{t} & = I_{\left(t-1\right)} + \frac{\beta I_{\left(t-1\right)}S_{\left(t-1\right)}}{N} - \gamma I_{\left(t-1\right)}\\ R_{t} & = R_{\left(t-1\right)} + \gamma I_{\left(t-1\right)} \end{align*}\]
Given
we can compute the trajectory of an epidemic (according to this model).
Talk about this with the people around you
Think of the interventions used during the Covid-19 pandemic.
Which model parameters would they affect, and how?
Reminder:
When we think about events happening, we are really thinking about risk.
There is uncertainty about if/when people will get infected or recover.
However, in compartmental models we work in terms of rates.
We assume that individuals become infectious at a constant rate, and that once infected they recover at a constant rate.
Convenient:
\[ \scriptstyle \text{Rate of occurence} = \frac{1}{\text{Average time to event}}. \]
Unrealistic:
Implies that the event duration is exponentially distributed:
Distribution of the time between becoming infectious and recovery, assuming that infectious individuals recover at a rate of 0.3 per day.
From our SIR model we can calculate a theoretical value for the basic reproduction number \(R_0\).
An epidemic grows if the number of infected individuals is increasing, that is if \[ \frac{\beta I_{\left(t-1\right)}S_{\left(t-1\right)}}{N} - \gamma I_{\left(t-1\right)} = \frac{\beta S_{\left(t-1\right)}}{N} - \gamma > 0\] assuming \(I_{\left(t-1\right)}>0\).
We can rearrange this as \[ \frac{\beta S_{\left(t-1\right)}}{N} > \gamma \] or \[ \frac{\beta S_{\left(t-1\right)}}{\gamma N} > 1. \]
At the start, we have \(S_{\left(t-1\right)}\approx{N}\) because almost everyone is still susceptible, so \[\frac{S_{\left(t-1\right)}}{N}\approx{1}\] and therefore an epidemic occurs if \[\frac{\beta}{\gamma}>1 \] and we have (for this model set-up) \(R_0=\frac{\beta}{\gamma}\).
Suppose we have a population of \(N=1000\) people, of which one is infectious with some disease, and none have immunity. This means we have \[ \begin{align*} S_1 & = 999 \\ I_1 &= 1 \\ R_1 &= 0 \end{align*}\] Assume that each person is in effective contact with 5 people per day, and that the probability of an infectious person passing the disease on to a susceptible person when in effective contact is 0.25.
With this disease, an infectious person is infectious for 2 days (on average). This means that \[ \begin{align*} \beta &= 5 \times{0.25} = 1.25 \\ \gamma &= \frac{1}{2} \end{align*}\]
To calculate the compartment populations at \(t=2\) therefore, we have
\[ \begin{align*} S_{2} &= S_1 -\frac{\beta I_1 S_1}{N} & =& 999 -\frac{1.25 \times{1}\times{999}}{1000} & = 997.8 \text{ (3 s.f.)}\\ I_2 &=I_1 + \frac{\beta I_1 S_1}{N} - \gamma I_1 & = &1+ \frac{1.25 \times{1}\times{999}}{1000} - \frac{1}{2}\times{1} &= 1.75\text{ (3 s.f.)} \\ R_2 &=R_1+ \gamma I_1 & =&0+ \frac{1}{2}\times{1} & = 0.5 \text{ (3 s.f.)} \end{align*} \]
We can program these equations in R (or another language) and plot the progress of an epidemic
The example above continued for 60 days. Points show the calculated values, the lines interpolate these points to make the paths clearer.
The total number of people infected over the course of the epidemic is known as the final epidemic size.
A closed system (where the total number of people, \(N\), stays the same) will tend towards \(\lbrace S^*, I^*, R^*\rbrace\), where
The final epidemic size is \(R^*\), the total number of people who contracted the disease, and \(S^*\) is the number who escaped it.
In the example, we can see that \(S^*=70, R^*=930\).
Having calculated \(R_0 = 2.5\), we can now calculate \(R_n\) using the proportion of susceptibles at each time point: \[R_n\left(t\right) = \frac{S_t}{N}R_0\]
The values of \(R_n\) and \(R_0\) for the model described above.
Notice that the point at which \(R_n\approx{1}\) corresponds to the point at
which the number of infectious people begins to decrease, around day
13.
Unfortunately, this discrete time-step approach is sensitive and can be unstable, especially for
We will re-run our example using 6-hourly and 3-daily time steps.
This means adjusting our parameters to have the same overall meaning:
\[\begin{align*} \beta_{6\text{hourly}} &= 0.25\times{\beta_{\text{daily}}}\\ \gamma_{6\text{hourly}} &=0.25\times\gamma_{\text{daily}} \end{align*}\]
and similarly \[\begin{align*} \beta_{3\text{daily}} &= 3\times{\beta_{\text{daily}}}\\ \gamma_{3\text{daily}} &=3\times\gamma_{\text{daily}} \end{align*}\]
Our SIR model with three different timesteps.
The shorter the timestep, the smoother the curves and the
less erratic the behaviour.
In particular, with a 3 day
timestep we have some negative \(I\)
values and some \(R\) values greater
than \(N\).
Because the disease is progressing continuously (rather than in discrete steps) our model needs to work continuously.
In this section
For a continuous model, it is sometimes impossible to write down a formula for the populations of the compartments at a particular time point
We therefore think in terms of the rate of change in each compartment over time.
Think first of the change in compartment populations from time \(t\) to time \(t+1\)
For example, \[ \frac{I_{\left(t+1\right)} - I_{t}}{\left(t+1\right) - t} = I_{\left(t+1\right)} - I_{t}, \] This is basically a gradient - how the population of \(I\) changes from one time step to the next.
We can do this for some amount of time \(\delta t\) \[ \frac{I_{\left(t+\delta t\right)} - I_{t}}{\left(t+\delta t\right) - t}=\frac{I_{\left(t+\delta t\right)} - I_{t}}{\delta t}, \] where \(\delta t\) represents some very small (but non-zero) time interval.
This doesn’t quite make sense, since we are no longer thinking in discrete time steps, so we change to continuous function notation:
\[ \frac{I\left(t+\delta t\right) - I\left(t\right)}{\left(t+\delta t\right) - t} = \frac{I\left(t+\delta t\right) - I\left(t\right)}{\delta t}. \]
The smaller the time-step \(\delta t\), the more accurate our gradient \[\frac{I\left(t+\delta t\right) - I\left(t\right)}{\delta t}\] becomes.
Thinking continuously
We would like the gradient to change continuously as the function \(I\left(t\right)\) changes, so we think about the limit as \(\delta t\) “tends to zero”: \[ \frac{dI\left(t\right)}{dt} = \lim\limits_{\delta t \rightarrow{0}}{\frac{I\left(t+\delta t\right) - I\left(t\right)}{\delta t}}. \] We call this ‘the derivative of \(I\left(t\right)\) with respect to \(t\)’, and often write it \[\frac{dI}{dt}.\]
We can now specify our SIR model as a system of ordinary differential equations
These equate the rates of change of the compartments to the populations of the compartments as follows:
\[ \begin{align*} \frac{dS}{dt} &= -\frac{\beta I S}{N}\\ \frac{dI}{dt} &= \frac{\beta I S}{N} - \gamma I\\ \frac{dR}{dt} &= \gamma I \end{align*}. \]
Notice that we have dropped the \(\left(t\right)\) notation, but \(S,\,I\) and \(R\) are each still functions of \(t\).
The differential equations need to be solved for us to calculate \(S\left(t\right),\,I\left(t\right)\) and \(R\left(t\right)\).
To set up an SIR model in R, there are five steps:
There are several ways to solve systems of ordinary differential
equations in R, but we will mainly use the package
deSolve
.
# Lines like this one, with a hash at the start, are comments
library(deSolve)
sir_model = function (t, y, parms)
{
# label state variables from within the y vector of initial values
S = y[1] # susceptibles
I = y[2] # infectious
R = y[3] # recovered
N = S + I + R
# Pull parameter values from the parms vector
beta = parms["beta"]
gamma = parms["gamma"]
# Define equations (dS etc. are simplified notation for dS/dt)
dS = -(beta*S*I)/N
dI = (beta*S*I)/N - gamma*I
dR = gamma * I
# return list of gradients
res = c(dS, dI, dR)
list(res)
}
We must also specify the time points for which we would like the model to return output, for example:
## [1] 0.0 0.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0 4.5 5.0 5.5 6.0 6.5 7.0
## [16] 7.5 8.0 8.5 9.0 9.5 10.0 10.5 11.0 11.5 12.0 12.5 13.0 13.5 14.0 14.5
## [31] 15.0 15.5 16.0 16.5 17.0 17.5 18.0 18.5 19.0 19.5 20.0 20.5 21.0 21.5 22.0
## [46] 22.5 23.0 23.5 24.0 24.5 25.0 25.5 26.0 26.5 27.0 27.5 28.0 28.5 29.0 29.5
## [61] 30.0 30.5 31.0 31.5 32.0 32.5 33.0 33.5 34.0 34.5 35.0 35.5 36.0 36.5 37.0
## [76] 37.5 38.0 38.5 39.0 39.5 40.0 40.5 41.0 41.5 42.0 42.5 43.0 43.5 44.0 44.5
## [91] 45.0 45.5 46.0 46.5 47.0 47.5 48.0 48.5 49.0 49.5 50.0
Whatever time unit we think in terms of while defining our parameters, that is our time unit.
Note: Unlike in the discrete case, this has nothing to do with the resolution at which R calculates the output.
We must supply values for parms
For example, to recreate our example we’d put
## beta gamma
## 1.25 0.50
Because we supply a named vector (with the names matching the
parameters listed in sir_model
) the function knows how to
use each value.
We supply starting values for each compartment, for example:
This is to be used as the y
argument to the function
sir_model
.
Again, the elements of the vector are named with the same names as
those used in sir_model
, so that their intended use is
clear.
Finally, we supply all this information to the function
ode
:
output1 = ode (y=initial_values, times = timepoints, func = sir_model, parms= params_vec)
output1 = as.data.frame(output1)
head(output1)
## time S I R
## 1 0.0 999.0000 1.000000 0.0000000
## 2 0.5 998.2430 1.453761 0.3032041
## 3 1.0 997.1439 2.112208 0.7438629
## 4 1.5 995.5498 3.066331 1.3838414
## 5 2.0 993.2416 4.446083 2.3123499
## 6 2.5 989.9071 6.435446 3.6574847
The object output1
is then a matrix (which we have now
turned into a data.frame) with four columns: time, S, I and R, showing
the populations in each compartment at each time step in
timepoints
.
Using the parameters we have specified we can also calculate \(R_0\), and then \(R_n\) at each time point:
# calculate R0:
R0 = params_vec['beta']/params_vec['gamma']
# calculate total number of people from initial values
pop_total = sum(initial_values)
# Calculate Rn and add as a column of output1
output1$Rn = R0*(1/pop_total)*output1$S
head(output1)
## time S I R Rn
## 1 0.0 999.0000 1.000000 0.0000000 2.497500
## 2 0.5 998.2430 1.453761 0.3032041 2.495608
## 3 1.0 997.1439 2.112208 0.7438629 2.492860
## 4 1.5 995.5498 3.066331 1.3838414 2.488875
## 5 2.0 993.2416 4.446083 2.3123499 2.483104
## 6 2.5 989.9071 6.435446 3.6574847 2.474768
It is often simpler to work with the data in ‘long’ format (more
rows, fewer columns), so we write a plotting function which pivots the
data frame to be ‘long’ before plotting the data using the package
ggplot2
(Wickham 2016):
plot_sir = function(
sir_df,
Compartments = c("S", "I", "R") # default
){
# Collect the values in columns S, I and R (or those given as "Compartments")
# into one column named 'value', and create an additional column named
# "Compartment".
output_tidy = pivot_longer(sir_df, cols = all_of(Compartments), names_to = "Compartment")
# Plot the values, with a different coloured line for each compartment
ggplot(data=output_tidy, aes(x=time, y=value, col=Compartment)) +
geom_line() + ylab("People")
}
There is a new ‘Compartment’ column, and the populations are in the ‘value’ column.
This is much simpler to plot using the package ggplot2
(Wickham 2016):
You may have been thinking that this model wouldn’t accurately represent many real-life infectious diseases, and you would be correct.
Some of the assumptions / simplifications that we will go on to investigate are:
We’ve covered a lot of ground in these first two lectures.
The main points to take away are:
There are many, many articles about SIR models (especially since Covid-19!) so here are some that may be of interest: