In this workshop, we will revisit the topics covered in Lectures 3 and 4:

  • Estimating the growth rate \(\Lambda\) and basic reproduction number \(R_0\) from early data
  • Calibrating the parameters of a compartmental model
  • Extending the SIR-type model to cover more sophisticated behaviour

Much of the code used in this workshop will be quite complex. I have tried to make the exercises simpler by combining new things with things you’ve done before, or building up the complexity such that only a few things need to be changed. This is very often how ‘real’ coding is done anyway! If there are any parts you’d like to understand better, please ask.

You can find the code for this workshop in the file MMHDS_workshop2.R on Ultra or the module webpage.

1 Estimating \(\Lambda\) and \(R_0\)

To look into estimating \(R_0\) from early epidemic data, we will study data from a SARS outbreak in Canada in 2003. You can load this by

library(outbreaks)
data("sars_canada_2003")

## to display the first few lines
head(sars_canada_2003)
##         date cases_travel cases_household cases_healthcare cases_other
## 1 2003-02-23            1               0                0           0
## 2 2003-02-24            0               0                0           0
## 3 2003-02-25            0               0                0           0
## 4 2003-02-26            0               1                0           0
## 5 2003-02-27            0               0                0           0
## 6 2003-02-28            1               0                0           0

This dataset has five columns: one for the date, and four for different sorts of new cases.

Use rowSums to create a new ‘Total’ column in <dataframe>, which sums the columns selected by <columnVector>:

<dataframe>$Total = rowSums(<dataframe>[ ,<columnVector>])

For example, to create a column HHtot summing travel and household cases for sars_canada_2003, we write

sars_canada_2003$HHtot = rowSums(sars_canada_2003[ ,c(2,3)])


To create a cumulative total from a column, we use cumsum. Following on from the technique above, to create a column HHtot_cumsum, containing the cumulative total of cases from travel and household each day, we write

sars_canada_2003$HHtot_cumsum = cumsum(sars_canada_2003$HHtot)


Using the methods from Lecture 3, estimate \(R_0\) for this outbreak of SARS. According to the center for disease control and prevention SARS is not known to have a latent period, and Lipsitch et al. (2003) estimate the serial interval \(T_S\) to be around 7 days.

Some hints:

  • Remember this technique uses only the early data - how does your choice of cut-off point affect your answer?
  • Use lm to find the gradient (you will have done this in ISDS).

2 Calibrating a compartmental model

2.1 Continuing with our Mumps example

Now we will further develop the example we began in lecture 3, calibrating the parameters of an SIR model to data from a mumps outbreak in Savoonga. In order to do this, we need the sir_model function we have been using throughout, as well as the Mumps data for Savoonga.

The data frame is stored in the RData object “mumps_sav.Rdata”, which you can find in the Resources section on Ultra, or linked from the module webpage.

2.2 Setting up optim_nm

The function optim_nm in the optimization library. It requires two main arguments:

  • fun - the function we wish to minimise
  • k - the number of parameters to optimise
  • start - starting values for the parameters. The initial simplex is constructed around this start vector.

So, in our case:

  • For fun we will define a function fn_mumps_rmse that returns the RMSE of the SIR model at any \(\beta,\,\gamma\) pair when compared to the mumps data.
  • For our SIR model we will set k=2
  • For start we will give starting values for \(\beta\) and \(\gamma\)
## 'opt_fun_sav' is our 'fun'. It takes in parameter values, 
## runs the SIR model with those values,
## compares the output to the mumps data and returns the RMSE
library(optimization)
## Loading required package: Rcpp
## To run the ode function within opt_fun_sav we require
## initial values for the compartment populations 
init_sav = c(S=258, I=1, R=0)

opt_fun_sav = function(par){

  
## The following lines prevent 'optim_nm' from wandering into negative parameter space
  if (par[1]<0){
    par[1] = -par[1]
  } 
  if(par[2]<0){
    par[2] = -par[2]
  }

  ## Run the SIR model with the current parameter values 
  ## We want the ode output times to match the data
  ## so we set the time vector to be the same as in
  ## mumps_sav
  output_sav_try = ode(
    y=init_sav, 
    times = mumps_sav$time_days, 
    func = sir_model, 
    parms= c(
      beta = par[1], 
      gamma = par[2]
    )
  )

  output_savtry_df = as.data.frame(output_sav_try) # coerce to a data.frame object
  ## Compute a cumulative I column from the model output
  output_savtry_df$Icum = 259 - output_savtry_df$S
  
  n_obs = nrow(mumps_sav)
  
  ## Create a data frame with observed and model prediction data for Savoonga
  sav_rmse_df = data.frame(
    time = mumps_sav$time_days,
    obs = mumps_sav$cum_cases,
    pred = output_savtry_df$Icum
  )
  # Calculate the RMSE for this prediction
  rmse_try = sqrt((1/n_obs)*(sum((sav_rmse_df$pred - sav_rmse_df$obs)^2)))  

  return(rmse_try)
  
}
# Run Nelder-Mead optimisation to find the point in parameter space leading to the smallest RMSE
optnm_result=optim_nm(opt_fun_sav,start=c(0.6,0.6),k=2, trace=T)
optnm_result
## $par
##                     
## 0.2485377 0.1546450 
## 
## $function_value
## [1] 5.082235
## 
## $trace
##       iteration function_value       x_1       x_2
##  [1,]         1     111.627858 0.6000000 0.6000000
##  [2,]         2     111.627858 0.6000000 0.6000000
##  [3,]         3     111.627858 0.6000000 0.6000000
##  [4,]         4     111.627858 0.6000000 0.6000000
##  [5,]         5      88.641677 1.5659258 0.8588190
##  [6,]         6      88.641677 1.5659258 0.8588190
##  [7,]         7      87.476116 1.0269269 0.9385386
##  [8,]         8      78.973814 1.9928528 1.1973576
##  [9,]         9      76.907072 1.4538539 1.2770772
## [10,]        10      73.249184 2.4197797 1.5358962
## [11,]        11      72.099122 1.8807808 1.6156158
## [12,]        12      69.580908 2.8467067 1.8744348
## [13,]        13      69.569266 2.3077078 1.9541544
## [14,]        14      67.089512 3.2736336 2.2129734
## [15,]        15      67.089512 3.2736336 2.2129734
## [16,]        16      65.335769 3.7005606 2.5515120
## [17,]        17      59.332216 3.1108659 2.3374679
## [18,]        18      59.332216 3.1108659 2.3374679
## [19,]        19      59.332216 3.1108659 2.3374679
## [20,]        20      59.332216 3.1108659 2.3374679
## [21,]        21      58.726597 2.4361205 1.8975040
## [22,]        22      56.075560 1.8599365 1.3613710
## [23,]        23      53.400447 1.1851911 0.9214071
## [24,]        24      49.187801 0.6090071 0.3852741
## [25,]        25      49.187801 0.6090071 0.3852741
## [26,]        26      49.187801 0.6090071 0.3852741
## [27,]        27      30.159984 0.3199661 0.1867793
## [28,]        28      30.159984 0.3199661 0.1867793
## [29,]        29      30.159984 0.3199661 0.1867793
## [30,]        30      30.159984 0.3199661 0.1867793
## [31,]        31       9.346497 0.2174392 0.1280744
## [32,]        32       9.346497 0.2174392 0.1280744
## [33,]        33       9.346497 0.2174392 0.1280744
## [34,]        34       9.346497 0.2174392 0.1280744
## [35,]        35       8.634029 0.2765861 0.1799851
## [36,]        36       8.235032 0.2811213 0.1803908
## [37,]        37       5.091256 0.2481465 0.1541312
## [38,]        38       5.091256 0.2481465 0.1541312
## [39,]        39       5.091256 0.2481465 0.1541312
## [40,]        40       5.091256 0.2481465 0.1541312
## [41,]        41       5.091256 0.2481465 0.1541312
## [42,]        42       5.091256 0.2481465 0.1541312
## [43,]        43       5.091256 0.2481465 0.1541312
## [44,]        44       5.091256 0.2481465 0.1541312
## [45,]        45       5.091256 0.2481465 0.1541312
## [46,]        46       5.090967 0.2477647 0.1542165
## [47,]        47       5.083664 0.2489521 0.1550327
## [48,]        48       5.082970 0.2482524 0.1543779
## [49,]        49       5.082970 0.2482524 0.1543779
## [50,]        50       5.082752 0.2483928 0.1545831
## [51,]        51       5.082445 0.2486374 0.1547566
## [52,]        52       5.082321 0.2483838 0.1545239
## [53,]        53       5.082321 0.2483838 0.1545239
## [54,]        54       5.082251 0.2485717 0.1546836
## [55,]        55       5.082251 0.2485717 0.1546836
## [56,]        56       5.082243 0.2484730 0.1545955
## [57,]        57       5.082235 0.2485377 0.1546450
## 
## $fun
## function(par){
## 
##   
## ## The following lines prevent 'optim_nm' from wandering into negative parameter space
##   if (par[1]<0){
##     par[1] = -par[1]
##   } 
##   if(par[2]<0){
##     par[2] = -par[2]
##   }
## 
##   ## Run the SIR model with the current parameter values 
##   ## We want the ode output times to match the data
##   ## so we set the time vector to be the same as in
##   ## mumps_sav
##   output_sav_try = ode(
##     y=init_sav, 
##     times = mumps_sav$time_days, 
##     func = sir_model, 
##     parms= c(
##       beta = par[1], 
##       gamma = par[2]
##     )
##   )
## 
##   output_savtry_df = as.data.frame(output_sav_try) # coerce to a data.frame object
##   ## Compute a cumulative I column from the model output
##   output_savtry_df$Icum = 259 - output_savtry_df$S
##   
##   n_obs = nrow(mumps_sav)
##   
##   ## Create a data frame with observed and model prediction data for Savoonga
##   sav_rmse_df = data.frame(
##     time = mumps_sav$time_days,
##     obs = mumps_sav$cum_cases,
##     pred = output_savtry_df$Icum
##   )
##   # Calculate the RMSE for this prediction
##   rmse_try = sqrt((1/n_obs)*(sum((sav_rmse_df$pred - sav_rmse_df$obs)^2)))  
## 
##   return(rmse_try)
##   
## }
## <bytecode: 0x0000027a1662eb00>
## 
## $start
## [1] 0.6 0.6
## 
## $upper
## [1] NA
## 
## $lower
## [1] NA
## 
## $control
## $control$k
## [1] 2
## 
## $control$iterations
## [1] 57
## 
## 
## attr(,"class")
## [1] "optim_nmsa"

In the above, par gives the parameter values found to minimise fun, function_value gives the minimised value of fun (ie. the RMSE of the best model fit, in this case). control tells us how many times fun was evaluated (a lot fewer than our 1000 points to get a better answer!).

2.3 Plotting the model fit

The SIR model fit found by Nelder-Mead, shown with the actual data.

Figure 2.1: The SIR model fit found by Nelder-Mead, shown with the actual data.

Now that we have this optimised value:

  • Using the code from the previous section, plot the optimised SIR model against the mumps data.
  • Calculate \(R_0\) for the point you have found.
  • Try different starting values for \(\beta\) and \(\gamma\). Does this change the outcome?

In reality, optim_nm may not find the global minimum RMSE, but instead get stuck somewhere where it can’t easily improve. It is therefore important to try several starting points, and also to check that the point found is biologically plausible. There is also uncertainty associated with the optimal point found, mainly because

  • There is uncertainty in the data
  • The model is not a perfect representation of the system

2.3.1 Developing the model further

Because we no longer have to ‘fill’ the parameter space (as we do with the Monte Carlo approach), it is quite simple to apply this method to a model with more parameters.

It turns out that Mumps has a pre-infectious period of around 10-20 days. This means that an SEIR model will be more suitable.

  • Adapt the optim code to find the optimal parameter values for an SEIR model. This will involve
    • Creating a new version of opt_fun_sav to use your SEIR code from the previous workshop
    • Creating a new version of par_init to include the the third parameter
  • Compare your result to the one from the SIR model. Is the fit better? Is \(R_0\) similar? (For an SEIR model without vital dynamics, \(R_0=\frac{\beta}{\gamma}\)). What are the implications of this change of model?


The R library outbreaks contains data for a number of infectious disease outbreaks. One of these is influenza_england_1978_school, which records an outbreak of Flu in an English boarding school. The in_bed column can be used as a proxy for the number infectious.

  • Estimate \(R_0\) using the data from the first few days.
  • Fit a model to the full dataset
  • Compare the results - do the \(R_0\) values agree? How closely can you fit the data?

Some hints:

  • Once the library is loaded, you can load a dataset by `data()
  • You can find out details (eg. \(N\), column descriptions etc.) about the dataset by doing `?
  • It’s sensible to rename the dataset with a much shorter name, eg. flu78 = influenza_england_1978_school (technially this doesn’t rename the original object but instead creates a copy with a different name, but with small datasets this doesn’t really matter).
  • To use with SIR model output, you’ll need to create a column with time in days numbered from the starting date. If the dates are in Date format, you can do this by eg. flu78$time = as.numeric(flu78$date - min(flu78$date)) + 1

3 Developing more detailed models

As we saw in Lecture 4, for many infectious diseases, a simple SIR or SEIR model is not sufficient. This may be because the dynamics of the disease itself are more complicated, or because the output would not contain the information needed by public health decision makers, or both.

In our lecture we studied RSV as an example, where reinfection is possible, as well as waning immunity. The code for this example is below, with the seasonality function included.

# Define Q function for seasonality
# In this function t is in years
Qfun = function(t, a, b, phi, tau, mu){
  top = b*(a*cos(2*pi*(t-phi))+1)
  bottom = tau + mu
  return(top/bottom)
}


sir_rsv_seas = function (t, y, params)
{
  # label state variables from within the y vector
  S = y[1]        # susceptibles
  IS = y[2]        # Infectious (from S)
  IR = y[3]        # infectious (from R)
  R = y[4]        # recovered (but still infectious, remember)
  # Create variable N for total number
  N = S + IR + IS + R
  # Pull parameter values from the params vector
  a = params["a"]
  phi = params["phi"]
  mu = params["mu"]
  alpha = params["alpha"]
  tau = params["tau"]
  rho = params["rho"]
  sigma=params["sigma"]
  b = params["b"]
  eta = params["eta"]
  
  # Calculate Lambda function, for some value of t and parameter
  # values as specified
  # Divide t by 365 to go from years to days
    lambda = Qfun(t/365, a, b, phi, tau, mu)*(IS + eta*IR)*(tau+mu)
    
    # Rates of change for each compartment

    dS = mu*N - lambda*S/N + (alpha*tau/rho)*(IS+IR+R) - mu*S
    dIS = lambda*S/N - (tau+mu)*IS
    dIR = sigma*lambda*R/N - (tau/rho + mu)*IR
    dR = (1-alpha/rho)*tau*IS + (1-alpha)*(tau/rho)*IR - 
      R*(sigma*lambda/N + alpha*tau/rho + mu)
    
    
    # return list of gradients   
    res = c(dS, dIS, dIR, dR)
    return(list(res))
  
  }
# Set parameter values
params_rsv_seas = c(
  a= 0.5,
  phi = 0,
  mu=1/(365*70),
  alpha = 0.16,
  tau = 1/6,
  rho = 0.8,
  sigma = 0.3,
  b = 0.5,
  eta = 0.8
)

# Initial values for differential equations
# We will run the model for 2.5 years, to show the seasonal pattern
initial_vals_rsv = c (S = 0.999, IS=0.001, IR = 0, R = 0)
timepoints_rsv_seas = seq (0, 365*2.5, by=0.1)

# Run model and format output
output_rsv_seas = ode (y=initial_vals_rsv, times = timepoints_rsv_seas, 
                  func = sir_rsv_seas, parms= params_rsv_seas)

out_rsv_seas_df = as.data.frame(output_rsv_seas) # coerce to a data.frame object
out_rsv_seas_tidy = pivot_longer(out_rsv_seas_df, cols = c("S", "IS", "IR", "R"), 
                            names_to = "Compartment")

# Create date variable to show progression through year
out_rsv_seas_tidy$Date = as.Date("31-12-2019", format = "%d-%m-%Y") + out_rsv_seas_tidy$time

# Plot compartment populations
ggplot(data=out_rsv_seas_tidy, aes(x=Date, y=value, col=Compartment)) + 
  geom_path() + xlab("Time (days)") + ylab("Proportion of population") +
   scale_x_date(date_labels = "%B")
Compartment populations in the RSV model, with seasonality. For this plot, we set a=0.5, phi=0.

Figure 3.1: Compartment populations in the RSV model, with seasonality. For this plot, we set a=0.5, phi=0.

Legrand et al. (2007) present a compartmental model for Ebola, containing several compartments and parameters we haven’t yet seen. You can find the article by copying the title into Google Scholar.

  • Read through the paper. The details of the model are given in Table 2, so read at least up to here. Make sure you understand why each compartment has been included, and how the flow from one to another is modelled.
  • If you’re feeling brave, have a go at building this model in R, and see how changes to the parameters affect the progression of the epidemic.

References

Legrand, Judith, Rebecca Freeman Grais, Pierre-Yves Boelle, Alain-Jacques Valleron, and Antoine Flahault. 2007. “Understanding the Dynamics of Ebola Epidemics.” Epidemiology & Infection 135 (4): 610–21.
Lipsitch, Marc, Ted Cohen, Ben Cooper, James M Robins, Stefan Ma, Lyn James, Gowri Gopalakrishna, et al. 2003. “Transmission Dynamics and Control of Severe Acute Respiratory Syndrome.” Science 300 (5627): 1966–70.