1 Overview

In our lectures so far we have learned about the theory of compartmental models, and specifically SIR models. In this workshop, we will:

  • Implement the SIR model we derived in lectures using difference equations …
  • … and the ordinary differential equations (ODEs)
  • Add an exposed (E) compartment to the model
  • Explore some other simple models

You can find the code for this workshop in the file MMHDS_workshop1.R on Ultra or the module webpage.

Please do ask questions whenever you have any!

2 SIR models with difference equations

Don’t spend more than 30 minutes on this section!

In the first part of Lectures 1 and 2 we built an SIR model using difference equations. Despite the shortcomings we saw with this method, these are often used (for example people sometimes program them in Excel), and so we will briefly explore the model.

# Lines like this one, with a hash at the start, are comments
# R will not parse them, they are there to help the human readers

sir_diff = function(
  S1,           ## Initial #susceptibles
  I1,           ## Initial #infectious
  R1,           ## Initial #removed
  beta,         ## beta parameter value
  gamma,        ## gamma parameter value
  times         ## vector of times to run model (first element taken as initial)
){
  N=S1+I1+R1
  nt = length(times)
  # create a data frame to fill in with updated compartment populations
  out_df = data.frame(
    Time = times, S = rep(NA, nt), I = rep(NA, nt), R=rep(NA, nt)
  )
  # Put the initial populations in the first row of out_df
  out_df$S[1] = S1
  out_df$I[1] = I1
  out_df$R[1] = R1

  # Loop through the times vector, using the previous row of out_df
  # to calculate the next
  for (i in 2:nt){
    # find \left(t-1\right) populations
    S_tminus1 = out_df$S[i-1]
    I_tminus1 = out_df$I[i-1]
    R_tminus1 = out_df$R[i-1]
    # calculate t populations using difference equations
    S_t = S_tminus1 - (beta*I_tminus1*S_tminus1)/N
    I_t = I_tminus1 + (beta*I_tminus1*S_tminus1)/N - gamma*I_tminus1
    R_t = R_tminus1 + gamma*I_tminus1
    # Add newly calculated values to out_df
    out_df$S[i] = S_t
    out_df$I[i] = I_t
    out_df$R[i] = R_t 
  }
  
  # return the fully populated data frame
  return(out_df)
}

# This function rearranges the output from sir_diff to be `long` rather than `wide`
# This makes it easier to plot with ggplot2, as we can colour the lines by 
# Compartment, rather than plotting each compartment separately.

plot_sirdiff = function(sir_df){
  # convert the output to 'long' format
  sirdf_tidy = pivot_longer(sir_df, cols = c("S", "I", "R"), names_to = "Compartment")
  # Plot the data, as points (to show the actual points of computation)
  # and lines
  ggplot(data=sirdf_tidy) + 
    geom_point(aes(x=Time, y=value, col=Compartment))+
    geom_line(aes(x=Time, y=value, col=Compartment), linewidth=0.5)+
    ylab("Population")
  
}

# Run this model for the parameters and initial values given in lectures
sir_eg_1000 = sir_diff(999, 1, 0, 1.25, 1/2, 1:60)
# plot the output above
plot_sirdiff(sir_eg_1000)
The SIR model from the first lecture, with difference equations.

Figure 2.1: The SIR model from the first lecture, with difference equations.

Explore what happens with this model when you alter the parameter values.

  • What is the effect of increasing/decreasing \(\beta\) (or \(\gamma\))?
  • Experiment with different time-steps, changing the parameters accordingly. Can you use time-steps of 5 or 10 days? Can you find parameter values for which this output remains realistic?

3 SIR models with ODEs

This section is more or less repeated from the lecture - if you are happy with the R code, feel free to skip to the exercises.

As we saw in Lectures, working with ODEs is a much more stable and flexible approach, and from now on that is how we will solve our compartmental models. There are several ways to do this in R, but we will mainly use the package deSolve.

To set up an SIR model in R, there are five steps:

  1. Define an R function containing the general set of equations, those that govern the flow of people from one compartment to another.
  2. Specify the time points for which we want R to output a solution.
  3. Supply values for the parameters (for example \(\beta\) and \(\gamma\)).
  4. Supply initial values for the populations of each compartment.
  5. Call the function that will put all these together.

3.1 Solving the SIR model using deSolve

3.1.1 Defining the set of equations

We set up the SIR model in R in the following way:

library(deSolve)
sir_model = function (t, y, parms)
{
  # label state variables from within the y vector of initial values
  S = y[1]        # susceptibles
  I = y[2]        # infectious
  R = y[3]        # recovered
  N = S + I + R
  
  # Pull parameter values from the parms vector
  beta = parms["beta"]
  gamma = parms["gamma"]

  # Define equations (dS etc. are simplified notation for dS/dt)
  dS = -(beta*S*I)/N
  dI = (beta*S*I)/N - gamma*I
  dR = gamma * I

  # return list of gradients   
  res = c(dS, dI, dR)
  list(res)
}

This is step 1. The function first tells R that the initial values of the three compartments S, I and R are in the vector y.

We then name our parameters, beta \(\left(\beta\right)\) and gamma \(\left(\gamma\right)\), to be used in the model. These are the elements of the vector parms, which we will supply when we run the model.

Finally, we define the system equations, to match the set of equations we first defined in the lecture. Note that here, dS, dI and dR are being used as short-hand for \(\frac{dS}{dt}\) etc.

To be used as the func argument in the function ode (which we will see shortly), the R function defining the mathematical model (in this case sir_model) must have the following arguments:

  • t - the current time point
  • y - the current estimate of the variabes in the ODE system
  • parms - a vector or list of parameters

3.1.2 Specifying time points

We must also specify the time points for which we would like the model to return output, for example:

timepoints = seq (0, 50, by=0.2)
timepoints
##   [1]  0.0  0.2  0.4  0.6  0.8  1.0  1.2  1.4  1.6  1.8  2.0  2.2  2.4  2.6  2.8
##  [16]  3.0  3.2  3.4  3.6  3.8  4.0  4.2  4.4  4.6  4.8  5.0  5.2  5.4  5.6  5.8
##  [31]  6.0  6.2  6.4  6.6  6.8  7.0  7.2  7.4  7.6  7.8  8.0  8.2  8.4  8.6  8.8
##  [46]  9.0  9.2  9.4  9.6  9.8 10.0 10.2 10.4 10.6 10.8 11.0 11.2 11.4 11.6 11.8
##  [61] 12.0 12.2 12.4 12.6 12.8 13.0 13.2 13.4 13.6 13.8 14.0 14.2 14.4 14.6 14.8
##  [76] 15.0 15.2 15.4 15.6 15.8 16.0 16.2 16.4 16.6 16.8 17.0 17.2 17.4 17.6 17.8
##  [91] 18.0 18.2 18.4 18.6 18.8 19.0 19.2 19.4 19.6 19.8 20.0 20.2 20.4 20.6 20.8
## [106] 21.0 21.2 21.4 21.6 21.8 22.0 22.2 22.4 22.6 22.8 23.0 23.2 23.4 23.6 23.8
## [121] 24.0 24.2 24.4 24.6 24.8 25.0 25.2 25.4 25.6 25.8 26.0 26.2 26.4 26.6 26.8
## [136] 27.0 27.2 27.4 27.6 27.8 28.0 28.2 28.4 28.6 28.8 29.0 29.2 29.4 29.6 29.8
## [151] 30.0 30.2 30.4 30.6 30.8 31.0 31.2 31.4 31.6 31.8 32.0 32.2 32.4 32.6 32.8
## [166] 33.0 33.2 33.4 33.6 33.8 34.0 34.2 34.4 34.6 34.8 35.0 35.2 35.4 35.6 35.8
## [181] 36.0 36.2 36.4 36.6 36.8 37.0 37.2 37.4 37.6 37.8 38.0 38.2 38.4 38.6 38.8
## [196] 39.0 39.2 39.4 39.6 39.8 40.0 40.2 40.4 40.6 40.8 41.0 41.2 41.4 41.6 41.8
## [211] 42.0 42.2 42.4 42.6 42.8 43.0 43.2 43.4 43.6 43.8 44.0 44.2 44.4 44.6 44.8
## [226] 45.0 45.2 45.4 45.6 45.8 46.0 46.2 46.4 46.6 46.8 47.0 47.2 47.4 47.6 47.8
## [241] 48.0 48.2 48.4 48.6 48.8 49.0 49.2 49.4 49.6 49.8 50.0

This way of modelling does not use a specific unit of time, it simply solves the differential equations it has been given and outputs the solution at every point in timepoints. So, whatever time steps we think in terms of while defining our parameters (often days, but not necessarily), that is the time step of the model, even though the output may be given more frequently. Since we have specified our rate parameters in terms of days, the line above means that the model will output the progresssion of the disease over 50 days, in increments of 0.2 days. The first of the output times must be the initial time.

3.1.3 Parameter values

The function sir_model works with a general parameters argument called parms, but in order to run a model, we must supply values, for example to recreate our example from the lecture we would put:

params_vec = c(
  beta = 1.25, 
  gamma = 1/2
  )
params_vec
##  beta gamma 
##  1.25  0.50

Because we supply a named vector (with the names matching the parameters listed in sir_model) the function knows how to use each value.

3.1.4 Initial values

In order to run the SIR model, we need to supply starting values for each compartment, for example:

initial_values = c (S = 999, I = 1, R = 0)

This is to be used as the y argument to the function sir_model. Again, the elements of the vector are named with the same names as those used in sir_model, so that their intended use is clear.

3.1.5 Putting it all together

Finally, we supply all this information to the function ode, which then outputs the populations at each element of timepoints:

output1 = ode (y=initial_values, times = timepoints, func = sir_model, parms= params_vec)
output1 = as.data.frame(output1)
head(output1)
##   time        S        I         R
## 1  0.0 999.0000 1.000000 0.0000000
## 2  0.2 998.7306 1.161506 0.1078745
## 3  0.4 998.4178 1.348999 0.2331675
## 4  0.6 998.0547 1.566623 0.3786783
## 5  0.8 997.6332 1.819178 0.5476556
## 6  1.0 997.1439 2.112208 0.7438627

The object output1 is then a matrix (which we have now turned into a data.frame) with four columns: time, S, I and R, showing the populations in each compartment at each time step.

Using the parameters we have specified we can also calculate \(R_0\), and then \(R_n\) at each time point:

# calculate R0:
R0 = params_vec[1]/params_vec[2]
# calculate total number of people from initial values
pop_total = sum(initial_values)
output1$Rn = R0*(1/pop_total)*output1$S

head(output1)
##   time        S        I         R       Rn
## 1  0.0 999.0000 1.000000 0.0000000 2.497500
## 2  0.2 998.7306 1.161506 0.1078745 2.496827
## 3  0.4 998.4178 1.348999 0.2331675 2.496045
## 4  0.6 998.0547 1.566623 0.3786783 2.495137
## 5  0.8 997.6332 1.819178 0.5476556 2.494083
## 6  1.0 997.1439 2.112208 0.7438627 2.492860

Although this ‘wide’ format of table is easier to look at, it can often be simpler to work with the data in ‘long’ format (with more rows and fewer columns), so we write a plotting function which pivots the data frame to be ‘long’ before plotting the data using the package ggplot2 (Wickham 2016):

plot_sir = function(
  sir_df, 
  Compartments = c("S", "I", "R")
){
# Collect the values in columns S, I and R (or those given as "Compartments")
#  into one column named 'value', and create an additional column named 
# "Compartment".
    output_tidy = pivot_longer(sir_df, cols = all_of(Compartments), names_to = "Compartment")
  # Plot the values, with a different coloured line for each compartment
    ggplot(data=output_tidy, aes(x=time, y=value, col=Compartment)) + 
      geom_line() + ylab("People")  
  }

This has rearranged the table so that rather than have a column for each compartment, there is a new ‘Compartment’ column, and the populations are in the ‘value’ column. There is now a row showing the population of each compartment at each time step. One advantage is that this is much simpler to plot using the package ggplot2 (Wickham 2016):

plot_sir(output1, Compartments= c("Rn")) + ylab(expression(R[n]))

plot_sir(output1, Compartments= c("S", "I", "R"))

Execute each line of Section 3 of “MMHDS_workshop1.R” in R. You may have to install some of the libraries used (these are the ones called by library at the top of the file).

  • Change some of the parameter values and see how the plot alters. What sorts of changes make the peak in infectious cases later / earlier, sharper / shallower? How do they affect the final epidemic size?
  • What happens if you choose your parameter values to ensure \(R_0<1\)?
  • In theory there are infinitely many pairs \(\left(\beta,\gamma\right)\) that would give any valid value of \(R_0\). For example, \(\beta=1, \gamma=0.5\) gives \(R_0=2\), as does \(\beta=0.5, \gamma = 0.25\). Does the model produce similar results for similar \(R_0\)? If so, how? How does the output change with the magnitude of the parameters?

Remember to give your objects (parameter vectors, output data frames etc.) unique names so that you can recreate your results.

4 Adding an \(E\) compartment

Let’s now investigate the impact of adding a pre-infectious (or exposed) \(E\) compartment, to create an SEIR model.

The trajectory in an SEIRS model.

Figure 4.1: The trajectory in an SEIRS model.

This is an important step towards more realistic epidemic models, since it allows the model to have an incubation period.

4.1 Adapting the model

This change will require us to re-think our set of differential equations slightly, since rather than going straight from \(S\) to \(I\), people now spend some time in \(E\). We will revisit the differential equations one-by-one before building a new model in R.

Before reading ahead, spend some time thinking about the change to the equations brought about by the introduction of the \(E\) compartment.

  • Which equations will change?
  • Will we need any new parameters? If so, what do you think they will represent?

4.1.1 Rate of change in susceptibles (\(S\))

When people leave the \(S\) compartment, they are now going to the \(E\) compartment. However, they are still being infected by people in the \(I\) compartment - those in \(E\) are infected but not infectious. So the equation for \(S\) remains the same:

\[\frac{dS}{dt} = -\frac{\beta I S}{N}. \]

4.1.2 Rate of change in pre-infectious (\(E\))

The people leaving \(S\) above are joining \(E\), so the first term will be \(\frac{\beta I S}{N}\). As people’s latent period comes to an end, they move to the \(I\) compartment. Much as with recovery, we will model this with a rate. We will use \(\sigma\) for this rate parameter, and it will be the proportion of pre-infectious (exposed) people who become infectious at each time step. Again, as with recovery we can think in terms of the expected length of the latent period (which is \(\frac{1}{\sigma}\) the inverse of the rate). Therefore we have:

\[\frac{dE}{dt} = \frac{\beta I S}{N} - \sigma E.\]

4.1.3 Rate of change in infectious (\(I\))

People are now joining the \(I\) compartment from \(E\) with rate \(\sigma\), which gives a \(\sigma E\) term. We also still have people leaving \(I\) with recovery rate \(\gamma\), and so we have:

\[\frac{dI}{dt} = \sigma E - \gamma I. \]

4.1.4 Rate of change in recovered/removed (\(R\))

The rate of change in \(R\) is unchanged, since there is no direct connection to the \(E\) compartment, and so we still have \[\frac{dR}{dt} = \gamma I.\]

If anything is unclear about the changes to the differential equations made above, please speak to the tutor before completing this exercise.

Adapt the SIR model code (create a new version in a new R file) to include this \(E\) compartment. You will need to:

  1. Write a new function analagous to sir_model (call it seir_model) which includes an \(E\) compartment with the equations and parameters above. Call \(\sigma\) sigma.
  2. Adapt the vectors of parameter values (analogous to params_vec) and initial values (analogous to initial_values) to run the new model.
  3. Modify the inputs to ode to incorporate these new functions and vectors.
  4. Run the SEIR model at for some chosen parameter values. What happens if you keep \(\beta\) and \(\gamma\) the same but vary \(\sigma\)? Is this what you’d expect?

Be sure to save your R code (in a .R file), as you’ll need it later in the course!

5 Further variations

There are many, many variations on the SIR model. Some are listed on this Wikipedia page.

Two such variations are:

What do these models have in common? When might one opt to use the more detailed SIRS model? What complications does the more complex model bring? Can you code up the two models in R and compare their behaviour?

References

Wickham, Hadley. 2016. Ggplot2: Elegant Graphics for Data Analysis. Springer-Verlag New York. https://ggplot2.tidyverse.org.