In this workshop, we will revisit the topics covered in Lectures 3 and 4:
Much of the code used in this workshop will be quite complex. I have tried to make the exercises simpler by combining new things with things you’ve done before, or building up the complexity such that only a few things need to be changed. This is very often how ‘real’ coding is done anyway! If there are any parts you’d like to understand better, please ask.
You can find the code for this workshop in the file MMHDS_workshop2.R
on Ultra or the module webpage.
To look into estimating \(R_0\) from early epidemic data, we will study data from a SARS outbreak in Canada in 2003. You can load this by
library(outbreaks)
data("sars_canada_2003")
## to display the first few lines
head(sars_canada_2003)
## date cases_travel cases_household cases_healthcare cases_other
## 1 2003-02-23 1 0 0 0
## 2 2003-02-24 0 0 0 0
## 3 2003-02-25 0 0 0 0
## 4 2003-02-26 0 1 0 0
## 5 2003-02-27 0 0 0 0
## 6 2003-02-28 1 0 0 0
This dataset has five columns: one for the date, and four for different sorts of new cases.
Use rowSums
to create a new ‘Total’ column in <dataframe>
, which sums the columns selected by <columnVector>
:
<dataframe>$Total = rowSums(<dataframe>[ ,<columnVector>])
For example, to create a column HHtot
summing travel and household cases for sars_canada_2003
, we write
$HHtot = rowSums(sars_canada_2003[ ,c(2,3)]) sars_canada_2003
To create a cumulative total from a column, we use cumsum
. Following on from the technique above, to create a column HHtot_cumsum
, containing the cumulative total of cases from travel and household each day, we write
$HHtot_cumsum = cumsum(sars_canada_2003$HHtot) sars_canada_2003
Using the methods from Lecture 3, estimate \(R_0\) for this outbreak of SARS. According to the center for disease control and prevention SARS is not known to have a latent period, and Lipsitch et al. (2003) estimate the serial interval \(T_S\) to be around 7 days.
Some hints:
lm
to find the gradient (you will have done this in ISDS).Now we will further develop the example we began in lecture 3, calibrating the parameters of an SIR model to data from a mumps outbreak in Savoonga. In order to do this, we need the sir_model
function we have been using throughout, as well as the Mumps data for Savoonga.
The data frame is stored in the RData object “mumps_sav.Rdata”, which you can find in the Resources section on Ultra, or linked from the module webpage.
optim_nm
The function optim_nm
in the optimization
library. It requires two main arguments:
fun
- the function we wish to minimisek
- the number of parameters to optimisestart
- starting values for the parameters. The initial simplex is constructed around this start vector.So, in our case:
fun
we will define a function fn_mumps_rmse
that returns the RMSE of the SIR model at any \(\beta,\,\gamma\) pair when compared to the mumps data.k=2
start
we will give starting values for \(\beta\) and \(\gamma\)## 'opt_fun_sav' is our 'fun'. It takes in parameter values,
## runs the SIR model with those values,
## compares the output to the mumps data and returns the RMSE
library(optimization)
## Loading required package: Rcpp
## To run the ode function within opt_fun_sav we require
## initial values for the compartment populations
= c(S=258, I=1, R=0)
init_sav
= function(par){
opt_fun_sav
## The following lines prevent 'optim_nm' from wandering into negative parameter space
if (par[1]<0){
1] = -par[1]
par[
} if(par[2]<0){
2] = -par[2]
par[
}
## Run the SIR model with the current parameter values
## We want the ode output times to match the data
## so we set the time vector to be the same as in
## mumps_sav
= ode(
output_sav_try y=init_sav,
times = mumps_sav$time_days,
func = sir_model,
parms= c(
beta = par[1],
gamma = par[2]
)
)
= as.data.frame(output_sav_try) # coerce to a data.frame object
output_savtry_df ## Compute a cumulative I column from the model output
$Icum = 259 - output_savtry_df$S
output_savtry_df
= nrow(mumps_sav)
n_obs
## Create a data frame with observed and model prediction data for Savoonga
= data.frame(
sav_rmse_df time = mumps_sav$time_days,
obs = mumps_sav$cum_cases,
pred = output_savtry_df$Icum
)# Calculate the RMSE for this prediction
= sqrt((1/n_obs)*(sum((sav_rmse_df$pred - sav_rmse_df$obs)^2)))
rmse_try
return(rmse_try)
}# Run Nelder-Mead optimisation to find the point in parameter space leading to the smallest RMSE
=optim_nm(opt_fun_sav,start=c(0.6,0.6),k=2, trace=T)
optnm_result optnm_result
## $par
##
## 0.2485377 0.1546450
##
## $function_value
## [1] 5.082235
##
## $trace
## iteration function_value x_1 x_2
## [1,] 1 111.627858 0.6000000 0.6000000
## [2,] 2 111.627858 0.6000000 0.6000000
## [3,] 3 111.627858 0.6000000 0.6000000
## [4,] 4 111.627858 0.6000000 0.6000000
## [5,] 5 88.641677 1.5659258 0.8588190
## [6,] 6 88.641677 1.5659258 0.8588190
## [7,] 7 87.476116 1.0269269 0.9385386
## [8,] 8 78.973814 1.9928528 1.1973576
## [9,] 9 76.907072 1.4538539 1.2770772
## [10,] 10 73.249184 2.4197797 1.5358962
## [11,] 11 72.099122 1.8807808 1.6156158
## [12,] 12 69.580908 2.8467067 1.8744348
## [13,] 13 69.569266 2.3077078 1.9541544
## [14,] 14 67.089512 3.2736336 2.2129734
## [15,] 15 67.089512 3.2736336 2.2129734
## [16,] 16 65.335769 3.7005606 2.5515120
## [17,] 17 59.332216 3.1108659 2.3374679
## [18,] 18 59.332216 3.1108659 2.3374679
## [19,] 19 59.332216 3.1108659 2.3374679
## [20,] 20 59.332216 3.1108659 2.3374679
## [21,] 21 58.726597 2.4361205 1.8975040
## [22,] 22 56.075560 1.8599365 1.3613710
## [23,] 23 53.400447 1.1851911 0.9214071
## [24,] 24 49.187801 0.6090071 0.3852741
## [25,] 25 49.187801 0.6090071 0.3852741
## [26,] 26 49.187801 0.6090071 0.3852741
## [27,] 27 30.159984 0.3199661 0.1867793
## [28,] 28 30.159984 0.3199661 0.1867793
## [29,] 29 30.159984 0.3199661 0.1867793
## [30,] 30 30.159984 0.3199661 0.1867793
## [31,] 31 9.346497 0.2174392 0.1280744
## [32,] 32 9.346497 0.2174392 0.1280744
## [33,] 33 9.346497 0.2174392 0.1280744
## [34,] 34 9.346497 0.2174392 0.1280744
## [35,] 35 8.634029 0.2765861 0.1799851
## [36,] 36 8.235032 0.2811213 0.1803908
## [37,] 37 5.091256 0.2481465 0.1541312
## [38,] 38 5.091256 0.2481465 0.1541312
## [39,] 39 5.091256 0.2481465 0.1541312
## [40,] 40 5.091256 0.2481465 0.1541312
## [41,] 41 5.091256 0.2481465 0.1541312
## [42,] 42 5.091256 0.2481465 0.1541312
## [43,] 43 5.091256 0.2481465 0.1541312
## [44,] 44 5.091256 0.2481465 0.1541312
## [45,] 45 5.091256 0.2481465 0.1541312
## [46,] 46 5.090967 0.2477647 0.1542165
## [47,] 47 5.083664 0.2489521 0.1550327
## [48,] 48 5.082970 0.2482524 0.1543779
## [49,] 49 5.082970 0.2482524 0.1543779
## [50,] 50 5.082752 0.2483928 0.1545831
## [51,] 51 5.082445 0.2486374 0.1547566
## [52,] 52 5.082321 0.2483838 0.1545239
## [53,] 53 5.082321 0.2483838 0.1545239
## [54,] 54 5.082251 0.2485717 0.1546836
## [55,] 55 5.082251 0.2485717 0.1546836
## [56,] 56 5.082243 0.2484730 0.1545955
## [57,] 57 5.082235 0.2485377 0.1546450
##
## $fun
## function(par){
##
##
## ## The following lines prevent 'optim_nm' from wandering into negative parameter space
## if (par[1]<0){
## par[1] = -par[1]
## }
## if(par[2]<0){
## par[2] = -par[2]
## }
##
## ## Run the SIR model with the current parameter values
## ## We want the ode output times to match the data
## ## so we set the time vector to be the same as in
## ## mumps_sav
## output_sav_try = ode(
## y=init_sav,
## times = mumps_sav$time_days,
## func = sir_model,
## parms= c(
## beta = par[1],
## gamma = par[2]
## )
## )
##
## output_savtry_df = as.data.frame(output_sav_try) # coerce to a data.frame object
## ## Compute a cumulative I column from the model output
## output_savtry_df$Icum = 259 - output_savtry_df$S
##
## n_obs = nrow(mumps_sav)
##
## ## Create a data frame with observed and model prediction data for Savoonga
## sav_rmse_df = data.frame(
## time = mumps_sav$time_days,
## obs = mumps_sav$cum_cases,
## pred = output_savtry_df$Icum
## )
## # Calculate the RMSE for this prediction
## rmse_try = sqrt((1/n_obs)*(sum((sav_rmse_df$pred - sav_rmse_df$obs)^2)))
##
## return(rmse_try)
##
## }
## <bytecode: 0x0000027a1662eb00>
##
## $start
## [1] 0.6 0.6
##
## $upper
## [1] NA
##
## $lower
## [1] NA
##
## $control
## $control$k
## [1] 2
##
## $control$iterations
## [1] 57
##
##
## attr(,"class")
## [1] "optim_nmsa"
In the above, par
gives the parameter values found to minimise fun
, function_value
gives the minimised value of fun
(ie. the RMSE of the best model fit, in this case). control
tells us how many times fun
was evaluated (a lot fewer than our 1000 points to get a better answer!).
Figure 2.1: The SIR model fit found by Nelder-Mead, shown with the actual data.
Now that we have this optimised value:
In reality, optim_nm
may not find the global minimum RMSE, but instead get stuck somewhere where it can’t easily improve. It is therefore important to try several starting points, and also to check that the point found is biologically plausible. There is also uncertainty associated with the optimal point found, mainly because
Because we no longer have to ‘fill’ the parameter space (as we do with the Monte Carlo approach), it is quite simple to apply this method to a model with more parameters.
It turns out that Mumps has a pre-infectious period of around 10-20 days. This means that an SEIR model will be more suitable.
opt_fun_sav
to use your SEIR code from the previous workshoppar_init
to include the the third parameterThe R library outbreaks
contains data for a number of infectious disease outbreaks. One of these is influenza_england_1978_school
, which records an outbreak of Flu in an English boarding school. The in_bed
column can be used as a proxy for the number infectious.
Some hints:
flu78 = influenza_england_1978_school
(technially this doesn’t rename the original object but instead creates a copy with a different name, but with small datasets this doesn’t really matter).flu78$time = as.numeric(flu78$date - min(flu78$date)) + 1
As we saw in Lecture 4, for many infectious diseases, a simple SIR or SEIR model is not sufficient. This may be because the dynamics of the disease itself are more complicated, or because the output would not contain the information needed by public health decision makers, or both.
In our lecture we studied RSV as an example, where reinfection is possible, as well as waning immunity. The code for this example is below, with the seasonality function included.
# Define Q function for seasonality
# In this function t is in years
= function(t, a, b, phi, tau, mu){
Qfun = b*(a*cos(2*pi*(t-phi))+1)
top = tau + mu
bottom return(top/bottom)
}
= function (t, y, params)
sir_rsv_seas
{# label state variables from within the y vector
= y[1] # susceptibles
S = y[2] # Infectious (from S)
IS = y[3] # infectious (from R)
IR = y[4] # recovered (but still infectious, remember)
R # Create variable N for total number
= S + IR + IS + R
N # Pull parameter values from the params vector
= params["a"]
a = params["phi"]
phi = params["mu"]
mu = params["alpha"]
alpha = params["tau"]
tau = params["rho"]
rho =params["sigma"]
sigma= params["b"]
b = params["eta"]
eta
# Calculate Lambda function, for some value of t and parameter
# values as specified
# Divide t by 365 to go from years to days
= Qfun(t/365, a, b, phi, tau, mu)*(IS + eta*IR)*(tau+mu)
lambda
# Rates of change for each compartment
= mu*N - lambda*S/N + (alpha*tau/rho)*(IS+IR+R) - mu*S
dS = lambda*S/N - (tau+mu)*IS
dIS = sigma*lambda*R/N - (tau/rho + mu)*IR
dIR = (1-alpha/rho)*tau*IS + (1-alpha)*(tau/rho)*IR -
dR *(sigma*lambda/N + alpha*tau/rho + mu)
R
# return list of gradients
= c(dS, dIS, dIR, dR)
res return(list(res))
}# Set parameter values
= c(
params_rsv_seas a= 0.5,
phi = 0,
mu=1/(365*70),
alpha = 0.16,
tau = 1/6,
rho = 0.8,
sigma = 0.3,
b = 0.5,
eta = 0.8
)
# Initial values for differential equations
# We will run the model for 2.5 years, to show the seasonal pattern
= c (S = 0.999, IS=0.001, IR = 0, R = 0)
initial_vals_rsv = seq (0, 365*2.5, by=0.1)
timepoints_rsv_seas
# Run model and format output
= ode (y=initial_vals_rsv, times = timepoints_rsv_seas,
output_rsv_seas func = sir_rsv_seas, parms= params_rsv_seas)
= as.data.frame(output_rsv_seas) # coerce to a data.frame object
out_rsv_seas_df = pivot_longer(out_rsv_seas_df, cols = c("S", "IS", "IR", "R"),
out_rsv_seas_tidy names_to = "Compartment")
# Create date variable to show progression through year
$Date = as.Date("31-12-2019", format = "%d-%m-%Y") + out_rsv_seas_tidy$time
out_rsv_seas_tidy
# Plot compartment populations
ggplot(data=out_rsv_seas_tidy, aes(x=Date, y=value, col=Compartment)) +
geom_path() + xlab("Time (days)") + ylab("Proportion of population") +
scale_x_date(date_labels = "%B")
Figure 3.1: Compartment populations in the RSV model, with seasonality. For this plot, we set a=0.5, phi=0.
Legrand et al. (2007) present a compartmental model for Ebola, containing several compartments and parameters we haven’t yet seen. You can find the article by copying the title into Google Scholar.