In this lecture we will cover:
Much of the material in this section of the course is built on (Bjørnstad 2018) and (Vynnycky and White 2010), which are good references if you’d like to know more (see the References for details).
Before we think about modelling infectious diseases, we need to think about models and infectious diseases seperately.
We are familiar with many sorts of model - a scale model of a building, a fashion model, and of course a mathematical model. Although in many ways these are very different, they all have a similar purpose: to display the key attributes of the thing they are modelling. For example, the scale model gives an idea of the layout, proportions and styling of the building. However, the model will be lacking many attributes of the true thing. For example, the scale model will very likely be made of entirely different materials from the real building and have no wiring or plumbing. Likewise, a mathematical model of an infectious disease will [attempt to] replicate some of the features of a disease, but not all of them.
The features of an infectious disease that we are concerned with for modelling (in this course at least) primarily involve how the disease spreads and progresses through a population. These are different features of diseases than those one would usually think of, but are crucial if we are to build suitable models. For example, in the relatively simple models we study and implement, there will be little to no mention of symptoms.
When understanding the spread of disease, it’s important to think about the progress of the disease in each individual. Figure 2.1 shows several important phases within an infection and the transmission from one patient to another.
Figure 2.1: The progress of an infectious disease in a patient. From Arzt et al. (2019)
First of all, there is the distinction between whether or not the patient is showing symptoms (bottom - blue/purple arrows) and whether or not they are infectious (middle - green/red lines). From the moment the patient is infected, there is an incubation period (during which they have no symptoms) and a latent period (during which they are not infectious). These do not necessarily end at the same time. For example, in Figure 2.1, the person will be infectious before they begin to exhibit symptoms. During this patient’s infectious period, they can go on to infect one or more secondary cases, and this sequence of states is replicated.
Definition 2.1 The serial interval, \(T_S\), is the time interval from the onset of a primary case to the onset of a secondary case (one who was infected directly by the primary case). Even though infectiousness is our main concern, this is often estimated by the onset of clinical symptoms, because this is what drives reporting. It is only usually possible to estimate at the outset of an epidemic, when there are few cases and the relationship between successive cases is clear.
In disease modelling (at least in the relatively simple models that we will study) we are concerned only with whether or not the person is infectious, not with whether they have symptoms.
Most epidemiological models think of each person as belonging to a particular state (or compartment, as we will call it for SIR models). For example, a person can be susceptible, infectious, recovered and so on. When modelling any disease, it is important to consider
The main states we will encounter in our models are:
These sequences of states are used to name a widely used class of model called compartmental models. For example, an SIR model (as in Figure 2.2) is one in which a person starts as susceptible, can become infectious and subsequently recover with permanent immunity.
Figure 2.2: The trajectory in an SIR model.
In an SEIRS model (as in Figure 2.3, there is a latent period following infection (during which a person is in the ‘E’ state), and after recovering people can be reinfected (hence the arrow from ‘R’ back to ‘S’).
Figure 2.3: The trajectory in an SEIRS model.
Important considerations here are ones like:
An important feature of any disease model is some measure of how easily the disease is transmitted from one person to another. There are several measures of transmissibility:
This is perhaps the most accurate, but not the most widely used.
Definition 2.2 The secondary attack rate is the proportion of susceptibles in contact with a primary case (for example those in the same house or office) who are then infected.
Imagine that a nursery child is infectious with chickenpox. There are 10 other children in her nursery group. Four of them have already had chickenpox and are therefore immune, leaving six susceptible children in contact with the infectious child. Suppose two of these become infected, then we estimate the secondary attack rate to be \(\frac{2}{6} = \frac{1}{3}\).
The reproduction number, \(R\) is defined as the (average) number of successful direct transmissions (ie. susceptible people who are infected directly by that person) per infectious person. This is the number we have come to know as “the R number” in Covid times. There are two widely used versions of the reproduction number.
Definition 2.3 \(R_0\) is the basic reproduction number. This is the expected number of secondary cases arising from an infectious person in an otherwise susceptible population.
If people become immune following infection (or are already immune, for example through vaccination), then the proportion of susceptibles decreases throughout the epidemic, and therefore the number of secondary infections from each infectious person will be lower.
Definition 2.4 \(R_n\) is the effective reproduction number (or sometimes net reproduction number). \(R_n\) is calculated as \[\begin{equation} R_n = R_0 \times{s}, \end{equation}\] where \(s\) is the susceptible proportion of the population. This is also sometimes denoted \(R_e\) (for ‘effective’) or \(R_t\) (to denote the dependence on time).
Note that while \(R_n\) changes throughout the course of an epidemic, \(R_0\) does not, unless there are fundamental changes to the transmission dynamics, for example vastly reduced social contact.
The success of an epidemic relies on the number of infectious people growing from the small introductory group. Therefore the reproduction number needs to be greater than one for the disease to “succeed”. If the basic reproduction number is less than one (\(R_0<1\)) then there will be no epidemic to begin with. At the point when the effective reproduction number falls below one, the epidemic will eventually stop spreading.
We can use the effective reproduction number equation (from definition 2.4) to find the proportion of the population that needs to be immune (this will be \(1-s\)) to the disease to achieve \(R_n<1\). If \[s = \frac{1}{R_0}\] then we will have \[R_n=R_0 \times{\frac{1}{R_0}}=1,\] which is the critical case. In this case, the proportion who are immune, and therefore the herd-immunity threshold (HIT) is \[HIT = 1-s = 1-\frac{1}{R_0} = \frac{R_0-1}{R_0}.\] This number can be used as a (crude) target for vaccination programmes, for example, or to estimate the proportion of the population who will have succumbed to the disease before the epidemic dies out (if it is perfectly immunising).
For example (from Vynnycky and White 2010), for influenza \(R_0\approx{2}\), whereas for Measles \(R_0\approx{13}\). This means that when the proportion of susceptible people gets below \(0.5\) for Influenza, or \(0.08\) for Measles, \(R_n\approx{1}\) and the infection should start to die out.
With the extent of modern computing power, models can be very complex, and can incorporate many features of the disease. The challenge is to decide which features are important, since the more features a model has the harder it is to learn accurately. Is it important to be able to model different age groups? Is it important to understand the extents to which people are suffering from the disease (ie. mildly or to the point of hospitalisation)? We will look at these sorts of extensions later in the course.
The point of a model is that it replicates the important aspects of the true system in such a way that we can learn about it. It wouldn’t be ethical or practical to conduct experiments to monitor the spread of disease through populations of people or animals (although this has been done), and so people now build mathematical models.
It is important to consider the time period we intend to model. For example, if we are concerned with a period of a few months, a simple SIR or SEIR model may well suffice. If we wish to model 10 years, it may be important to include population dynamics (births, deaths and migrations) in our model. For a model covering a longer period, it may be important to factor in changes in the disease itself, for example new strains and variants. This might lead us to change to an SIRS or SEIRS model, where reinfection becomes possible. If we wish to make accurate weekly predictions, it may be important to include changes in the amount of contact made, for example due to the school holidays. We will explore some of these issues in the next workshop.
In this course we will come across many words with very specific meanings. Some of these will be fairly ‘every day’ words - these can be the trickiest as we are used to using them fairly casually.
We will now have a short quiz, to match up some such words to their definitions:
The link is at www.wooclap.com/EPIVOCAB but the quiz will only be active during the lecture.
Although everything discussed so far applies to several classes of epidemiological model, when people talk about ‘SIR models’, they are generally referring to compartmental models of infectious diseases. Confusingly, the term ‘SIR model’ can be used not just to refer to models with the SIR trajectory as in Figure 2.2, but this general class. In this course, we will be careful to name these compartmental models according to their trajectory.
Compartmental models are widely used to model infectious diseases, as well as many other phenomena. In a compartmental model, the population (we will talk about people as the members of the population, but in general it need not be people) is divided up into a number of compartments, each representing a different state in terms of the disease. Every member of the population can belong to one, and only one, compartment at any one time. People move from compartment to compartment at different rates. At each time step, the number of people in each compartment is updated according to a set of rules we specify as part of the model.
When developing a compartmental model, the important questions to ask are:
To learn how these models work, we will look at the basic SIR model first.
The SIR model was developed in the early 20th Century, by Kermack and McKendrick (1927), to model how a disease would spread through a population of susceptible people. The three compartments are:
In this simple form of the model, the total number of people remains constant at \(N= S + I + R\), and the model is closed. We will consider models where this is not the case later on in the course.
In this version of the model, people start in the ‘Susceptible’ compartment, apart from a small number (often one) of infectious individuals who introduce the disease. These infectious people transmit the disease to some of the susceptible people, who then become infectious themselves (and therefore move to the ‘Infectious’ compartment), and begin to spread the disease to other susceptible people. After some time, each infectious person recovers, and is immune to the disease.
To specify an SIR model of a particular disease in a particular setting and population, a key question is:
Given the number of people in each compartment today, can we calculate how many will be in each compartment tomorrow?
This question leads to the development of some difference equations and model parameters.
We can think of the model specified above through a set of difference equations, relating the number of people in each compartment at time \(t\) to the number in each compartment at time \(\left(t-1\right)\). We will use these subscripts to denote time, so for example, \(S_{\left(t-1\right)}\) is the number of susceptibles (people in the \(S\) compartment) at time \(\left(t-1\right)\). We will generally think of each time step as a day (so that if time \(t\) is today, time \(\left(t-1\right)\) is yesterday), but one could specify a model in terms of hours or weeks or any other time period.
Thinking about the possible transfers from each compartment to each other (shown by the arrows in Figure 2.2) we have
\[\begin{align*} S_{t} & = S_{\left(t-1\right)} - \left(\text{# newly infected at time }\left(t-1\right)\right)\\ I_{t} & = I_{\left(t-1\right)} + \left(\text{# newly infected at time }\left(t-1\right)\right) - \left(\text{# newly recovered at time }\left(t-1\right)\right)\\ R_{t} & = R_{\left(t-1\right)} + \left(\text{# newly recovered at time }\left(t-1\right)\right) \end{align*}\] That is, the population of each compartment at time \(t\) is equal to the population at time \(\left(t-1\right)\), plus those who have newly joined at time \(\left(t-1\right)\), minus those who left at time \(\left(t-1\right)\).
In a closed epidemic (ie. one with no births, deaths or migrations) we should have \[ S_{\left(t-1\right)} + I_{\left(t-1\right)} + R_{\left(t-1\right)} = S_t + I_t + R_t = N\] for all \(t\), and it is clear here that this is the case.
In this model there are therefore two quantities to think about: the number newly infected at each time step, and the number newly recovered at each time step. In this course we will follow the conventions of Kermack and McKendrick (1927), which are still commonly in use.
As we hinted when discussing the reproduction number, the number of people newly infected at each time step will depend on
We introduce the model parameter \(\beta\), the per capita transmission rate. It is often broken down as \[ \beta = \text{contact rate} \times \text{transmission probability}. \] If an infected person has contact with 6 [susceptible] people per day, and the probability of transmitting the disease [to a susceptible person] is 0.4, they will infect 2.4 people per day. We could think of \(\beta\) in this case as \[ \beta = 6 \times{0.4} = 2.4.\]
Contact here is used to mean effective contact - a level of contact which would be sufficient to lead to infection. The actual level of contact this means depends on the mode of transmission (for example respiratory / vector-borne / faecal-oral / sexual etc.). This means that \(\beta\) is likely to be much higher for diseases such as measles, which can be spread through relatively casual contact and in settings where there is a high level of close contact, for example schools and nurseries, or places where people live very close together.
The number of newly infected people at each time step (ie. those people moving from \(S\) (susceptible) to \(I\) (infected)) can be broken down as \[I \times{\beta} \times{\frac{S}{N}}= \frac{\beta I S}{N}. \] Think of this as each infectious person potentially infecting \(\beta\) people per day. The fraction \(\frac{S}{N}\) is there because not all of the people the infectious mix with will be susceptible to infection - they are only able to pass the disease on to the susceptible proportion of those with whom they have effective contact.
An important point here is the mass action principle. In this model, the entire population is assumed to be mixing randomly, with any chosen pair of people being equally likely to be in effective contact. In a compartmental model there is no real concept of ‘individuals’, or of some people being more proximal to one another. This is an assumption we will return to later, when we look at Agent Based Models.
The number of people recovering at each time step depends on
To model the number newly recovered at each time step we therefore introduce a second model parameter \(\gamma\), the recovery rate. This is the proportion of infectious people who will recover (or more precisely enter the ‘removed’ compartment) at each time step, and so the number of people newly recovered in our model will be \(\gamma I\).
The period of time for which a person is infectious is therefore approximately \(\frac{1}{\gamma}\) (‘approximately’ because some people will ‘overtake’ others in their recovery).
Our difference equations therefore become
\[\begin{align*} S_{t} & = S_{\left(t-1\right)} - \frac{\beta I_{\left(t-1\right)} S_{\left(t-1\right)}}{N}\\ I_{t} & = I_{\left(t-1\right)} + \frac{\beta I_{\left(t-1\right)}S_{\left(t-1\right)}}{N} - \gamma I_{\left(t-1\right)}\\ R_{t} & = R_{\left(t-1\right)} + \gamma I_{\left(t-1\right)} \end{align*}\]
and given values for \(\beta\) and \(\gamma\), and starting populations for each compartment, we can compute the trajectory of an epidemic (according to this model).
Exercise 3.1 Think of the interventions used during the Covid-19 pandemic. Which parts of this model would they affect, and how?
When we think about people getting infected, for example, we are really thinking about risk. There is uncertainty about whether any particular person will get infected, given their proximity to infectious people and other factors. However, in compartmental models we work in terms of rates. We assume that individuals become infectious at a constant rate, and that once infected they recover at a constant rate.
This is a convenient assumption, since we can make use of the fact that
\[ \text{Rate of occurence} = \frac{1}{\text{Average time to event}}. \]
For example, if we assume that, on average, someone infectious with chicken pox is infectious for 10 days, then the rate of recovery is \(\frac{1}{10}\).
These model assumptions are not realistic, since they imply that the duration of the event (eg. the duration for which someone is infectious) is exponentially distributed. In particular, because of the ‘memoryless’ property of the exponential distribution, it implies that the likelihood of someone recovering (if we pretend that this model involves chance) is not related to how long ago they were infected. Figure 3.1 shows the shape of the exponential distribution.
Figure 3.1: Distribution of the time between becoming infectious and recovery, assuming that infectious individuals recover at a rate of 0.3 per day.
Given this set of difference equations we can also calculate a theoretical value for the basic reproduction number \(R_0\) as follows. An epidemic grows if the number of infected individuals is increasing, that is if \[ \frac{\beta I_{\left(t-1\right)}S_{\left(t-1\right)}}{N} - \gamma I_{\left(t-1\right)} = \frac{\beta S_{\left(t-1\right)}}{N} - \gamma > 0\] assuming \(I_{\left(t-1\right)}>0\). We can rearrange this as \[ \frac{\beta S_{\left(t-1\right)}}{N} > \gamma \] or \[ \frac{\beta S_{\left(t-1\right)}}{\gamma N} > 1. \] At the outset of the epidemic (if it is to occur), when we introduce a single infectious person into a population of susceptibles, we have \(S_{\left(t-1\right)}\approx{N}\) because almost everyone is still in the susceptible compartment, and so \[\frac{S_{\left(t-1\right)}}{N}\approx{1}\] and therefore an epidemic occurs if \[\frac{\beta}{\gamma}>1 \] and we have (for this model set-up) \(R_0=\frac{\beta}{\gamma}\).
Perhaps a more intuitive way to think about this is that if an infectious person infects \(\beta\) people per day and they are infectious for approximately \(\frac{1}{\gamma}\) days, we would expect them to infect around \(\beta \times{\frac{1}{\gamma}} = \frac{\beta}{\gamma}\) people over the course of their infectious period, in an otherwise susceptible population.
Example 3.1 Suppose we have a population of \(N=1000\) people, of which one is infectious with some disease, and none have immunity. This means we have \[ \begin{align*} S_1 & = 999 \\ I_1 &= 1 \\ R_1 &= 0 \end{align*}\] We assume that each person is in effective contact with 5 people per day, and that the probability of an infectious person passing the disease on to a susceptible person when in effective contact is 0.25. We also assume that with this disease, an infectious person is infectious for 2 days (on average). This means that \[ \begin{align*} \beta &= 5 \times{0.25} = 1.25 \\ \gamma &= \frac{1}{2} \end{align*}\]
To calculate the second time step therefore, we have
\[ \begin{align*} S_{2} &= S_1 -\frac{\beta I_1 S_1}{N} & = 999 -\frac{1.25 \times{1}\times{999}}{1000} & = 997.8 \text{ (3 s.f.)}\\ I_2 &=I_1 + \frac{\beta I_1 S_1}{N} - \gamma I_1 & = 1+ \frac{1.25 \times{1}\times{999}}{1000} - \frac{1}{2}\times{1} &= 1.75\text{ (3 s.f.)} \\ R_2 &=R_1+ \gamma I_1 & =0+ \frac{1}{2}\times{1} & = 0.5 \text{ (3 s.f.)} \end{align*} \]
Exercise 3.2 Calculate the populations of each compartment for the following day, \(t=3\).
Exercise 3.3 Calculate \(R_0\) for this model.
Figure 3.2: The example above continued for 60 days. Points show the calculated values, the lines interpolate these points to make the paths clearer.
The total number of people infected over the course of the epidemic is known as the final epidemic size.
The populations in a closed system (where the total number of people, \(N\), stays the same) will tend towards \(\lbrace S^*, I^*, R^*\rbrace\), where \(I^*=0\) and the epidemic has extinguished itself. The final epidemic size is \(R^*\), the total number of people who contracted the disease, and \(S^*\) is the number who escaped it. In Figure 3.2 we can see that some susceptibles did not become infected, and so for the model set up in Example 3.1 we have \(S^*=70, R^*=930\).
Having calculated \(R_0 = 2.5\) in Exercise 3.3, we can now calculate \(R_n\) using the proportion of susceptibles at each time point.
Figure 3.3: The values of \(R_n\) and \(R_0\) for the model described above.
Notice that the point at which \(R_n\approx{1}\) corresponds to the point at which the number of infectious people begins to decrease, around day 13.
So far we have been working in discrete time steps, using the difference equations to move from one time to the next. However, this can often lead to numerically unstable results, particularly for large \(\beta\), small \(\gamma\) or long time steps, and the accuracy of the calculated populations depends on the time interval used.
In Figure 4.1, the same fundamental model from Example 3.1 is run with 6-hourly, daily and 3-day time steps. The parameters \(\beta\) and \(\gamma\) have to be adjusted to have the same overall meaning. We will briefly refer to the parameters for the 1 day timestep as \(\beta_{\text{daily}}\) and \(\gamma_{\text{daily}}\).
With 6-hourly timesteps, \(\beta_{6\text{hourly}} = 0.25\times{\beta_{\text{daily}}}\), since one would expect an infectious person to infect only a quarter as many people in a quarter of the time. Similarly, \(\gamma_{6\text{hourly}}=0.25\times\gamma_{\text{daily}}\), since a quarter of the number of people will recover in a quarter of the time (or thinking of \(\frac{1}{\gamma}\), the expected duration of infection will be four times longer). We can apply similar logic to find the 3-day values for \(\beta\) and \(\gamma\).
Figure 4.1: Our SIR model with three different timesteps.
Notice in Figure 4.1 that the shorter the timestep, the smoother the curves and the less erratic the behaviour. In particular, with a 3 day timestep we have some negative \(I\) values and some \(R\) values greater than \(N\).
To address this problem, we move from thinking in discrete time steps to thinking of time continuously. Effectively we are shrinking out timestep to be infinitesimally short. Our compartment populations will now be \(S\left(t\right),\;I\left(t\right)\) and \(R\left(t\right)\), functions of \(t\) (time) that can be evaluated for any value of \(t>0\) (not just whole numbers). We do still need to think in terms of a specific time interval, because our parameters \(\beta\) and \(\gamma\) relate to changes over a particular time interval (for example an hour, a day, a week), but the compartment populations will now change continuously with time, so that even if our model parameters were specified in terms of weekly behaviour, we could make hourly predictions (for example).
When thinking continuously, rather than in discrete time steps as before, it is sometimes impossible to write down a formula for the populations of the compartments at a particular time step, for example the number of Infectious individuals at time \(t\). We therefore think in terms of the rate of change in each compartment over time.
Returning to our difference equation for the Infectious compartment the first step is to think of the change in compartment populations over a change in time, for example, \[ \frac{I_{\left(t+1\right)} - I_{t}}{\left(t+1\right) - t} = I_{\left(t+1\right)} - I_{t}, \] where the denominator on the left hand side is there to emphasise that this is the difference from one time step to the next. Essentially we are calculating a gradient. This represents the change in the population of \(I\) from one time step to the next. We can push this a little further, by thinking of \[ \frac{I_{\left(t+\delta t\right)} - I_{t}}{\left(t+\delta t\right) - t}=\frac{I_{\left(t+\delta t\right)} - I_{t}}{\delta t}, \] where \(\delta t\) represents some very small (but non-zero) time interval. This doesn’t quite make sense, since we are now no longer thinking in discrete time steps, so we will change to continuous function notation:
\[ \frac{I\left(t+\delta t\right) - I\left(t\right)}{\left(t+\delta t\right) - t} = \frac{I\left(t+\delta t\right) - I\left(t\right)}{\delta t}. \] This is a gradient: the change in the function \(I\left(t\right)\) over some time difference \(\delta t\), divided by \(\delta t\). The smaller \(\delta t\) is, the more accurate the gradient, since it is being approximated over a smaller interval. What we would like is to have this gradient continually change as the function changes, and so we think about what happens ‘in the limit’ as \(\delta t\) becomes infinitesimally small, or ‘tends to zero’: \[ \frac{dI\left(t\right)}{dt} = \lim\limits_{\delta t \rightarrow{0}}{\frac{I\left(t+\delta t\right) - I\left(t\right)}{\delta t}}. \] We call this ‘the derivative of of \(I\left(t\right)\) with respect to \(t\)’.
The changes of the number of people in each compartment are therefore now specified in a series of ordinary differential equations. These equate the rates of change of the compartments to the populations of the compartments as follows (these equations are for our closed SIR model as in Figure 2.2):
\[ \begin{align*} \frac{dS}{dt} &= -\frac{\beta I S}{N}\\ \frac{dI}{dt} &= \frac{\beta I S}{N} - \gamma I\\ \frac{dR}{dt} &= \gamma I \end{align*}. \]
Notice that we have dropped the \(\left(t\right)\) notation, in the interest of parsimony, but \(S,\,I\) and \(R\) are each still functions of \(t\).
Whereas in the difference equations in Section 3.3.3 we could directly compute the populations of each compartment (given the parameter values and initial population values), the differential equations need to be solved. This can sometimes be done analytically, but we will use the R package deSolve
(Soetaert, Petzoldt, and Setzer 2010), which we will explore in the workshop.
Example 4.1 If we set \(\beta=1.25,\;\gamma=0.5\), we can recreate the model set up in Example 3.1 to find the differential equation solution shown by the solid lines.
Figure 4.2: Our SIR model with the differential equation version also shown.
deSolve
To set up an SIR model in R, there are five steps:
There are several ways to solve systems of ordinary differential equations in R, but we will mainly use the package deSolve
.
In general ODE terms, we need to supply a function that computes the values of the derivatives in the ODE system (in our case the system of SIR model equations) at time \(t\) . We set up the SIR model in R in the following way:
# Lines like this one, with a hash at the start, are comments
# R will not parse them, they are there to help the human readers
library(deSolve)
= function (t, y, parms)
sir_model
{# label state variables from within the y vector of initial values
= y[1] # susceptibles
S = y[2] # infectious
I = y[3] # recovered
R = S + I + R
N
# Pull parameter values from the parms vector
= parms["beta"]
beta = parms["gamma"]
gamma
# Define equations (dS etc. are simplified notation for dS/dt)
= -(beta*S*I)/N
dS = (beta*S*I)/N - gamma*I
dI = gamma * I
dR
# return list of gradients
= c(dS, dI, dR)
res list(res)
}
This is step 1. To be used as the func
argument in the function ode
(which we will see shortly), the R function defining the mathematical model (in this case sir_model
) must have the following arguments:
t
- the current time pointy
- the current estimate of the variables in the ODE systemparms
- a vector or list of parametersThe function first tells R that the initial values of the three compartments S
, I
and R
are in the vector y
. When we run the model, we will supply these initial values (ie. the values at time \(t=0\)).
We then name our parameters, beta
\(\left(\beta\right)\) and gamma
\(\left(\gamma\right)\), to be used in the model. These are the elements of the vector parms
, which we will supply when we run the model.
Finally, we define the system equations, to match the set of equations we first defined in the lecture, and return the values at this time point. Note that here, dS
, dI
and dR
are being used as short-hand for \(\frac{dS}{dt}\) etc, .
We must also specify the time points for which we would like the model to return output, for example:
= seq (0, 50, by=0.2)
timepoints timepoints
## [1] 0.0 0.2 0.4 0.6 0.8 1.0 1.2 1.4 1.6 1.8 2.0 2.2 2.4 2.6 2.8
## [16] 3.0 3.2 3.4 3.6 3.8 4.0 4.2 4.4 4.6 4.8 5.0 5.2 5.4 5.6 5.8
## [31] 6.0 6.2 6.4 6.6 6.8 7.0 7.2 7.4 7.6 7.8 8.0 8.2 8.4 8.6 8.8
## [46] 9.0 9.2 9.4 9.6 9.8 10.0 10.2 10.4 10.6 10.8 11.0 11.2 11.4 11.6 11.8
## [61] 12.0 12.2 12.4 12.6 12.8 13.0 13.2 13.4 13.6 13.8 14.0 14.2 14.4 14.6 14.8
## [76] 15.0 15.2 15.4 15.6 15.8 16.0 16.2 16.4 16.6 16.8 17.0 17.2 17.4 17.6 17.8
## [91] 18.0 18.2 18.4 18.6 18.8 19.0 19.2 19.4 19.6 19.8 20.0 20.2 20.4 20.6 20.8
## [106] 21.0 21.2 21.4 21.6 21.8 22.0 22.2 22.4 22.6 22.8 23.0 23.2 23.4 23.6 23.8
## [121] 24.0 24.2 24.4 24.6 24.8 25.0 25.2 25.4 25.6 25.8 26.0 26.2 26.4 26.6 26.8
## [136] 27.0 27.2 27.4 27.6 27.8 28.0 28.2 28.4 28.6 28.8 29.0 29.2 29.4 29.6 29.8
## [151] 30.0 30.2 30.4 30.6 30.8 31.0 31.2 31.4 31.6 31.8 32.0 32.2 32.4 32.6 32.8
## [166] 33.0 33.2 33.4 33.6 33.8 34.0 34.2 34.4 34.6 34.8 35.0 35.2 35.4 35.6 35.8
## [181] 36.0 36.2 36.4 36.6 36.8 37.0 37.2 37.4 37.6 37.8 38.0 38.2 38.4 38.6 38.8
## [196] 39.0 39.2 39.4 39.6 39.8 40.0 40.2 40.4 40.6 40.8 41.0 41.2 41.4 41.6 41.8
## [211] 42.0 42.2 42.4 42.6 42.8 43.0 43.2 43.4 43.6 43.8 44.0 44.2 44.4 44.6 44.8
## [226] 45.0 45.2 45.4 45.6 45.8 46.0 46.2 46.4 46.6 46.8 47.0 47.2 47.4 47.6 47.8
## [241] 48.0 48.2 48.4 48.6 48.8 49.0 49.2 49.4 49.6 49.8 50.0
This way of modelling does not use a specific unit of time, it simply solves the differential equations it has been given and outputs the solution at every point in timepoints
. So, whatever time steps we think in terms of while defining our parameters (often days, but not necessarily), that is the time step of the model, even though the output may be given more frequently. Since we have specified our rate parameters in terms of days, the line above means that the model will output the progresssion of the disease over 50 days, in increments of 0.2 days. The first of the output times must be the initial time.
Note that the values in timepoints
are not the only times at which the model is solved. They are not used for computation, but purely for us to specify the time points for which we would like output.
The function sir_model
works with a general parameters argument called parms
, but in order to run a model, we must supply values, for example to recreate our example from the lecture we would put:
= c(
params_vec beta = 1.25,
gamma = 1/2
) params_vec
## beta gamma
## 1.25 0.50
Because we supply a named vector (with the names matching the parameters listed in sir_model
) the function knows how to use each value.
In order to run the SIR model, we need to supply starting values for each compartment, for example:
= c (S = 999, I = 1, R = 0) initial_values
This is to be used as the y
argument to the function sir_model
. Again, the elements of the vector are named with the same names as those used in sir_model
, so that their intended use is clear.
Finally, we supply all this information to the function ode
, which then outputs the populations at each element of timepoints
:
= ode (y=initial_values, times = timepoints, func = sir_model, parms= params_vec)
output1 = as.data.frame(output1)
output1 head(output1)
## time S I R
## 1 0.0 999.0000 1.000000 0.0000000
## 2 0.2 998.7306 1.161506 0.1078745
## 3 0.4 998.4178 1.348999 0.2331675
## 4 0.6 998.0547 1.566623 0.3786783
## 5 0.8 997.6332 1.819178 0.5476556
## 6 1.0 997.1439 2.112208 0.7438627
The object output1
is then a matrix (which we have now turned into a data.frame) with four columns: time, S, I and R, showing the populations in each compartment at each time step.
Using the parameters we have specified we can also calculate \(R_0\), and then \(R_n\) at each time point:
# calculate R0:
= params_vec[1]/params_vec[2]
R0 # calculate total number of people from initial values
= sum(initial_values)
pop_total $Rn = R0*(1/pop_total)*output1$S
output1
head(output1)
## time S I R Rn
## 1 0.0 999.0000 1.000000 0.0000000 2.497500
## 2 0.2 998.7306 1.161506 0.1078745 2.496827
## 3 0.4 998.4178 1.348999 0.2331675 2.496045
## 4 0.6 998.0547 1.566623 0.3786783 2.495137
## 5 0.8 997.6332 1.819178 0.5476556 2.494083
## 6 1.0 997.1439 2.112208 0.7438627 2.492860
Although this ‘wide’ format of table is easier to look at, it can often be simpler to work with the data in ‘long’ format (with more rows and fewer columns), so we write a plotting function which pivots the data frame to be ‘long’ before plotting the data using the package ggplot2
(Wickham 2016):
= function(
plot_sir
sir_df, Compartments = c("S", "I", "R")
){# Collect the values in columns S, I and R (or those given as "Compartments")
# into one column named 'value', and create an additional column named
# "Compartment".
= pivot_longer(sir_df, cols = all_of(Compartments), names_to = "Compartment")
output_tidy # Plot the values, with a different coloured line for each compartment
ggplot(data=output_tidy, aes(x=time, y=value, col=Compartment)) +
geom_line() + ylab("People")
}
This has rearranged the table so that rather than have a column for each compartment, there is a new ‘Compartment’ column, and the populations are in the ‘value’ column. There is now a row showing the population of each compartment at each time step. One advantage is that this is much simpler to plot using the package ggplot2
(Wickham 2016):
plot_sir(output1, Compartments= c("Rn")) + ylab(expression(R[n]))
plot_sir(output1, Compartments= c("S", "I", "R"))
You will have plenty of opportunities to experiment with compartmental models in R in the coming workshops!
You may have been thinking that this model wouldn’t accurately represent many real-life infectious diseases, and you would be correct. Some of the assumptions / simplifications that we will go on to investigate this week are:
We’ve covered a lot of ground in these first two lectures. The main points to take away are:
There are many, many articles about SIR models (especially since the advent of Covid-19!) so here are some that may be of interest: