In our lectures so far we have learned about the theory of compartmental models, and specifically SIR models. In this workshop, we will:
You can find the code for this workshop in the file MMHDS_workshop1.R
on Ultra or the module webpage.
Please do ask questions whenever you have any!
Don’t spend more than 30 minutes on this section!
In the first part of Lectures 1 and 2 we built an SIR model using difference equations. Despite the shortcomings we saw with this method, these are often used (for example people sometimes program them in Excel), and so we will briefly explore the model.
# Lines like this one, with a hash at the start, are comments
# R will not parse them, they are there to help the human readers
= function(
sir_diff ## Initial #susceptibles
S1, ## Initial #infectious
I1, ## Initial #removed
R1, ## beta parameter value
beta, ## gamma parameter value
gamma, ## vector of times to run model (first element taken as initial)
times
){=S1+I1+R1
N= length(times)
nt # create a data frame to fill in with updated compartment populations
= data.frame(
out_df Time = times, S = rep(NA, nt), I = rep(NA, nt), R=rep(NA, nt)
)# Put the initial populations in the first row of out_df
$S[1] = S1
out_df$I[1] = I1
out_df$R[1] = R1
out_df
# Loop through the times vector, using the previous row of out_df
# to calculate the next
for (i in 2:nt){
# find \left(t-1\right) populations
= out_df$S[i-1]
S_tminus1 = out_df$I[i-1]
I_tminus1 = out_df$R[i-1]
R_tminus1 # calculate t populations using difference equations
= S_tminus1 - (beta*I_tminus1*S_tminus1)/N
S_t = I_tminus1 + (beta*I_tminus1*S_tminus1)/N - gamma*I_tminus1
I_t = R_tminus1 + gamma*I_tminus1
R_t # Add newly calculated values to out_df
$S[i] = S_t
out_df$I[i] = I_t
out_df$R[i] = R_t
out_df
}
# return the fully populated data frame
return(out_df)
}
# This function rearranges the output from sir_diff to be `long` rather than `wide`
# This makes it easier to plot with ggplot2, as we can colour the lines by
# Compartment, rather than plotting each compartment separately.
= function(sir_df){
plot_sirdiff # convert the output to 'long' format
= pivot_longer(sir_df, cols = c("S", "I", "R"), names_to = "Compartment")
sirdf_tidy # Plot the data, as points (to show the actual points of computation)
# and lines
ggplot(data=sirdf_tidy) +
geom_point(aes(x=Time, y=value, col=Compartment))+
geom_line(aes(x=Time, y=value, col=Compartment), linewidth=0.5)+
ylab("Population")
}
# Run this model for the parameters and initial values given in lectures
= sir_diff(999, 1, 0, 1.25, 1/2, 1:60)
sir_eg_1000 # plot the output above
plot_sirdiff(sir_eg_1000)
Figure 2.1: The SIR model from the first lecture, with difference equations.
Explore what happens with this model when you alter the parameter values.
This section is more or less repeated from the lecture - if you are happy with the R code, feel free to skip to the exercises.
As we saw in Lectures, working with ODEs is a much more stable and flexible approach, and from now on that is how we will solve our compartmental models. There are several ways to do this in R, but we will mainly use the package deSolve
.
To set up an SIR model in R, there are five steps:
deSolve
We set up the SIR model in R in the following way:
library(deSolve)
= function (t, y, parms)
sir_model
{# label state variables from within the y vector of initial values
= y[1] # susceptibles
S = y[2] # infectious
I = y[3] # recovered
R = S + I + R
N
# Pull parameter values from the parms vector
= parms["beta"]
beta = parms["gamma"]
gamma
# Define equations (dS etc. are simplified notation for dS/dt)
= -(beta*S*I)/N
dS = (beta*S*I)/N - gamma*I
dI = gamma * I
dR
# return list of gradients
= c(dS, dI, dR)
res list(res)
}
This is step 1. The function first tells R that the initial values of the three compartments S
, I
and R
are in the vector y
.
We then name our parameters, beta
\(\left(\beta\right)\) and gamma
\(\left(\gamma\right)\), to be used in the model. These are the elements of the vector parms
, which we will supply when we run the model.
Finally, we define the system equations, to match the set of equations we first defined in the lecture. Note that here, dS
, dI
and dR
are being used as short-hand for \(\frac{dS}{dt}\) etc.
To be used as the func
argument in the function ode
(which we will see shortly), the R function defining the mathematical model (in this case sir_model
) must have the following arguments:
t
- the current time pointy
- the current estimate of the variabes in the ODE systemparms
- a vector or list of parametersWe must also specify the time points for which we would like the model to return output, for example:
= seq (0, 50, by=0.2)
timepoints timepoints
## [1] 0.0 0.2 0.4 0.6 0.8 1.0 1.2 1.4 1.6 1.8 2.0 2.2 2.4 2.6 2.8
## [16] 3.0 3.2 3.4 3.6 3.8 4.0 4.2 4.4 4.6 4.8 5.0 5.2 5.4 5.6 5.8
## [31] 6.0 6.2 6.4 6.6 6.8 7.0 7.2 7.4 7.6 7.8 8.0 8.2 8.4 8.6 8.8
## [46] 9.0 9.2 9.4 9.6 9.8 10.0 10.2 10.4 10.6 10.8 11.0 11.2 11.4 11.6 11.8
## [61] 12.0 12.2 12.4 12.6 12.8 13.0 13.2 13.4 13.6 13.8 14.0 14.2 14.4 14.6 14.8
## [76] 15.0 15.2 15.4 15.6 15.8 16.0 16.2 16.4 16.6 16.8 17.0 17.2 17.4 17.6 17.8
## [91] 18.0 18.2 18.4 18.6 18.8 19.0 19.2 19.4 19.6 19.8 20.0 20.2 20.4 20.6 20.8
## [106] 21.0 21.2 21.4 21.6 21.8 22.0 22.2 22.4 22.6 22.8 23.0 23.2 23.4 23.6 23.8
## [121] 24.0 24.2 24.4 24.6 24.8 25.0 25.2 25.4 25.6 25.8 26.0 26.2 26.4 26.6 26.8
## [136] 27.0 27.2 27.4 27.6 27.8 28.0 28.2 28.4 28.6 28.8 29.0 29.2 29.4 29.6 29.8
## [151] 30.0 30.2 30.4 30.6 30.8 31.0 31.2 31.4 31.6 31.8 32.0 32.2 32.4 32.6 32.8
## [166] 33.0 33.2 33.4 33.6 33.8 34.0 34.2 34.4 34.6 34.8 35.0 35.2 35.4 35.6 35.8
## [181] 36.0 36.2 36.4 36.6 36.8 37.0 37.2 37.4 37.6 37.8 38.0 38.2 38.4 38.6 38.8
## [196] 39.0 39.2 39.4 39.6 39.8 40.0 40.2 40.4 40.6 40.8 41.0 41.2 41.4 41.6 41.8
## [211] 42.0 42.2 42.4 42.6 42.8 43.0 43.2 43.4 43.6 43.8 44.0 44.2 44.4 44.6 44.8
## [226] 45.0 45.2 45.4 45.6 45.8 46.0 46.2 46.4 46.6 46.8 47.0 47.2 47.4 47.6 47.8
## [241] 48.0 48.2 48.4 48.6 48.8 49.0 49.2 49.4 49.6 49.8 50.0
This way of modelling does not use a specific unit of time, it simply solves the differential equations it has been given and outputs the solution at every point in timepoints
. So, whatever time steps we think in terms of while defining our parameters (often days, but not necessarily), that is the time step of the model, even though the output may be given more frequently. Since we have specified our rate parameters in terms of days, the line above means that the model will output the progresssion of the disease over 50 days, in increments of 0.2 days. The first of the output times must be the initial time.
The function sir_model
works with a general parameters argument called parms
, but in order to run a model, we must supply values, for example to recreate our example from the lecture we would put:
= c(
params_vec beta = 1.25,
gamma = 1/2
) params_vec
## beta gamma
## 1.25 0.50
Because we supply a named vector (with the names matching the parameters listed in sir_model
) the function knows how to use each value.
In order to run the SIR model, we need to supply starting values for each compartment, for example:
= c (S = 999, I = 1, R = 0) initial_values
This is to be used as the y
argument to the function sir_model
. Again, the elements of the vector are named with the same names as those used in sir_model
, so that their intended use is clear.
Finally, we supply all this information to the function ode
, which then outputs the populations at each element of timepoints
:
= ode (y=initial_values, times = timepoints, func = sir_model, parms= params_vec)
output1 = as.data.frame(output1)
output1 head(output1)
## time S I R
## 1 0.0 999.0000 1.000000 0.0000000
## 2 0.2 998.7306 1.161506 0.1078745
## 3 0.4 998.4178 1.348999 0.2331675
## 4 0.6 998.0547 1.566623 0.3786783
## 5 0.8 997.6332 1.819178 0.5476556
## 6 1.0 997.1439 2.112208 0.7438627
The object output1
is then a matrix (which we have now turned into a data.frame) with four columns: time, S, I and R, showing the populations in each compartment at each time step.
Using the parameters we have specified we can also calculate \(R_0\), and then \(R_n\) at each time point:
# calculate R0:
= params_vec[1]/params_vec[2]
R0 # calculate total number of people from initial values
= sum(initial_values)
pop_total $Rn = R0*(1/pop_total)*output1$S
output1
head(output1)
## time S I R Rn
## 1 0.0 999.0000 1.000000 0.0000000 2.497500
## 2 0.2 998.7306 1.161506 0.1078745 2.496827
## 3 0.4 998.4178 1.348999 0.2331675 2.496045
## 4 0.6 998.0547 1.566623 0.3786783 2.495137
## 5 0.8 997.6332 1.819178 0.5476556 2.494083
## 6 1.0 997.1439 2.112208 0.7438627 2.492860
Although this ‘wide’ format of table is easier to look at, it can often be simpler to work with the data in ‘long’ format (with more rows and fewer columns), so we write a plotting function which pivots the data frame to be ‘long’ before plotting the data using the package ggplot2
(Wickham 2016):
= function(
plot_sir
sir_df, Compartments = c("S", "I", "R")
){# Collect the values in columns S, I and R (or those given as "Compartments")
# into one column named 'value', and create an additional column named
# "Compartment".
= pivot_longer(sir_df, cols = all_of(Compartments), names_to = "Compartment")
output_tidy # Plot the values, with a different coloured line for each compartment
ggplot(data=output_tidy, aes(x=time, y=value, col=Compartment)) +
geom_line() + ylab("People")
}
This has rearranged the table so that rather than have a column for each compartment, there is a new ‘Compartment’ column, and the populations are in the ‘value’ column. There is now a row showing the population of each compartment at each time step. One advantage is that this is much simpler to plot using the package ggplot2
(Wickham 2016):
plot_sir(output1, Compartments= c("Rn")) + ylab(expression(R[n]))
plot_sir(output1, Compartments= c("S", "I", "R"))
Execute each line of Section 3 of “MMHDS_workshop1.R” in R. You may have to install some of the libraries used (these are the ones called by library
at the top of the file).
Remember to give your objects (parameter vectors, output data frames etc.) unique names so that you can recreate your results.
Let’s now investigate the impact of adding a pre-infectious (or exposed) \(E\) compartment, to create an SEIR model.
Figure 4.1: The trajectory in an SEIRS model.
This is an important step towards more realistic epidemic models, since it allows the model to have an incubation period.
This change will require us to re-think our set of differential equations slightly, since rather than going straight from \(S\) to \(I\), people now spend some time in \(E\). We will revisit the differential equations one-by-one before building a new model in R.
Before reading ahead, spend some time thinking about the change to the equations brought about by the introduction of the \(E\) compartment.
When people leave the \(S\) compartment, they are now going to the \(E\) compartment. However, they are still being infected by people in the \(I\) compartment - those in \(E\) are infected but not infectious. So the equation for \(S\) remains the same:
\[\frac{dS}{dt} = -\frac{\beta I S}{N}. \]
The people leaving \(S\) above are joining \(E\), so the first term will be \(\frac{\beta I S}{N}\). As people’s latent period comes to an end, they move to the \(I\) compartment. Much as with recovery, we will model this with a rate. We will use \(\sigma\) for this rate parameter, and it will be the proportion of pre-infectious (exposed) people who become infectious at each time step. Again, as with recovery we can think in terms of the expected length of the latent period (which is \(\frac{1}{\sigma}\) the inverse of the rate). Therefore we have:
\[\frac{dE}{dt} = \frac{\beta I S}{N} - \sigma E.\]
People are now joining the \(I\) compartment from \(E\) with rate \(\sigma\), which gives a \(\sigma E\) term. We also still have people leaving \(I\) with recovery rate \(\gamma\), and so we have:
\[\frac{dI}{dt} = \sigma E - \gamma I. \]
The rate of change in \(R\) is unchanged, since there is no direct connection to the \(E\) compartment, and so we still have \[\frac{dR}{dt} = \gamma I.\]
If anything is unclear about the changes to the differential equations made above, please speak to the tutor before completing this exercise.
Adapt the SIR model code (create a new version in a new R file) to include this \(E\) compartment. You will need to:
sir_model
(call it seir_model
) which includes an \(E\) compartment with the equations and parameters above. Call \(\sigma\) sigma
.params_vec
) and initial values (analogous to initial_values
) to run the new model.ode
to incorporate these new functions and vectors.Be sure to save your R code (in a .R file), as you’ll need it later in the course!
There are many, many variations on the SIR model. Some are listed on this Wikipedia page.
Two such variations are:
What do these models have in common? When might one opt to use the more detailed SIRS model? What complications does the more complex model bring? Can you code up the two models in R and compare their behaviour?