In this workshop, we will revisit the topics covered in Lectures 3 and 4:
Much of the code used in this workshop will be quite complex. I have tried to make the exercises simpler by combining new things with things you’ve done before, or building up the complexity such that only a few things need to be changed. This is very often how ‘real’ coding is done anyway! If there are any parts you’d like to understand better, please ask.
You can find the code for this workshop in the file MMHDS_workshop2.R on Ultra or the module webpage.
To look into estimating \(R_0\) from early epidemic data, we will study data from a SARS outbreak in Canada in 2003. You can load this by
library(outbreaks)
data("sars_canada_2003")
## to display the first few lines
head(sars_canada_2003)## date cases_travel cases_household cases_healthcare cases_other
## 1 2003-02-23 1 0 0 0
## 2 2003-02-24 0 0 0 0
## 3 2003-02-25 0 0 0 0
## 4 2003-02-26 0 1 0 0
## 5 2003-02-27 0 0 0 0
## 6 2003-02-28 1 0 0 0
This dataset has five columns: one for the date, and four for different sorts of new cases.
Use rowSums to create a new ‘Total’ column in <dataframe>, which sums the columns selected by <columnVector>:
<dataframe>$Total = rowSums(<dataframe>[ ,<columnVector>])For example, to create a column HHtot summing travel and household cases for sars_canada_2003, we write
sars_canada_2003$HHtot = rowSums(sars_canada_2003[ ,c(2,3)])To create a cumulative total from a column, we use cumsum. Following on from the technique above, to create a column HHtot_cumsum, containing the cumulative total of cases from travel and household each day, we write
sars_canada_2003$HHtot_cumsum = cumsum(sars_canada_2003$HHtot)Using the methods from Lecture 3, estimate \(R_0\) for this outbreak of SARS. According to the center for disease control and prevention SARS is not known to have a latent period, and Lipsitch et al. (2003) estimate the serial interval \(T_S\) to be around 7 days.
Some hints:
lm to find the gradient (you will have done this in ISDS).Now we will further develop the example we began in lecture 3, calibrating the parameters of an SIR model to data from a mumps outbreak in Savoonga. In order to do this, we need the sir_model function we have been using throughout, as well as the Mumps data for Savoonga.
The data frame is stored in the RData object “mumps_sav.Rdata”, which you can find in the Resources section on Ultra, or linked from the module webpage.
optim_nmThe function optim_nm in the optimization library. It requires two main arguments:
fun - the function we wish to minimisek - the number of parameters to optimisestart - starting values for the parameters. The initial simplex is constructed around this start vector.So, in our case:
fun we will define a function fn_mumps_rmse that returns the RMSE of the SIR model at any \(\beta,\,\gamma\) pair when compared to the mumps data.k=2start we will give starting values for \(\beta\) and \(\gamma\)## 'opt_fun_sav' is our 'fun'. It takes in parameter values,
## runs the SIR model with those values,
## compares the output to the mumps data and returns the RMSE
library(optimization)## Loading required package: Rcpp
## To run the ode function within opt_fun_sav we require
## initial values for the compartment populations
init_sav = c(S=258, I=1, R=0)
opt_fun_sav = function(par){
## The following lines prevent 'optim_nm' from wandering into negative parameter space
if (par[1]<0){
par[1] = -par[1]
}
if(par[2]<0){
par[2] = -par[2]
}
## Run the SIR model with the current parameter values
## We want the ode output times to match the data
## so we set the time vector to be the same as in
## mumps_sav
output_sav_try = ode(
y=init_sav,
times = mumps_sav$time_days,
func = sir_model,
parms= c(
beta = par[1],
gamma = par[2]
)
)
output_savtry_df = as.data.frame(output_sav_try) # coerce to a data.frame object
## Compute a cumulative I column from the model output
output_savtry_df$Icum = 259 - output_savtry_df$S
n_obs = nrow(mumps_sav)
## Create a data frame with observed and model prediction data for Savoonga
sav_rmse_df = data.frame(
time = mumps_sav$time_days,
obs = mumps_sav$cum_cases,
pred = output_savtry_df$Icum
)
# Calculate the RMSE for this prediction
rmse_try = sqrt((1/n_obs)*(sum((sav_rmse_df$pred - sav_rmse_df$obs)^2)))
return(rmse_try)
}
# Run Nelder-Mead optimisation to find the point in parameter space leading to the smallest RMSE
optnm_result=optim_nm(opt_fun_sav,start=c(0.6,0.6),k=2, trace=T)
optnm_result## $par
##
## 0.2485377 0.1546450
##
## $function_value
## [1] 5.082235
##
## $trace
## iteration function_value x_1 x_2
## [1,] 1 111.627858 0.6000000 0.6000000
## [2,] 2 111.627858 0.6000000 0.6000000
## [3,] 3 111.627858 0.6000000 0.6000000
## [4,] 4 111.627858 0.6000000 0.6000000
## [5,] 5 88.641677 1.5659258 0.8588190
## [6,] 6 88.641677 1.5659258 0.8588190
## [7,] 7 87.476116 1.0269269 0.9385386
## [8,] 8 78.973814 1.9928528 1.1973576
## [9,] 9 76.907072 1.4538539 1.2770772
## [10,] 10 73.249184 2.4197797 1.5358962
## [11,] 11 72.099122 1.8807808 1.6156158
## [12,] 12 69.580908 2.8467067 1.8744348
## [13,] 13 69.569266 2.3077078 1.9541544
## [14,] 14 67.089512 3.2736336 2.2129734
## [15,] 15 67.089512 3.2736336 2.2129734
## [16,] 16 65.335769 3.7005606 2.5515120
## [17,] 17 59.332216 3.1108659 2.3374679
## [18,] 18 59.332216 3.1108659 2.3374679
## [19,] 19 59.332216 3.1108659 2.3374679
## [20,] 20 59.332216 3.1108659 2.3374679
## [21,] 21 58.726597 2.4361205 1.8975040
## [22,] 22 56.075560 1.8599365 1.3613710
## [23,] 23 53.400447 1.1851911 0.9214071
## [24,] 24 49.187801 0.6090071 0.3852741
## [25,] 25 49.187801 0.6090071 0.3852741
## [26,] 26 49.187801 0.6090071 0.3852741
## [27,] 27 30.159984 0.3199661 0.1867793
## [28,] 28 30.159984 0.3199661 0.1867793
## [29,] 29 30.159984 0.3199661 0.1867793
## [30,] 30 30.159984 0.3199661 0.1867793
## [31,] 31 9.346497 0.2174392 0.1280744
## [32,] 32 9.346497 0.2174392 0.1280744
## [33,] 33 9.346497 0.2174392 0.1280744
## [34,] 34 9.346497 0.2174392 0.1280744
## [35,] 35 8.634029 0.2765861 0.1799851
## [36,] 36 8.235032 0.2811213 0.1803908
## [37,] 37 5.091256 0.2481465 0.1541312
## [38,] 38 5.091256 0.2481465 0.1541312
## [39,] 39 5.091256 0.2481465 0.1541312
## [40,] 40 5.091256 0.2481465 0.1541312
## [41,] 41 5.091256 0.2481465 0.1541312
## [42,] 42 5.091256 0.2481465 0.1541312
## [43,] 43 5.091256 0.2481465 0.1541312
## [44,] 44 5.091256 0.2481465 0.1541312
## [45,] 45 5.091256 0.2481465 0.1541312
## [46,] 46 5.090967 0.2477647 0.1542165
## [47,] 47 5.083664 0.2489521 0.1550327
## [48,] 48 5.082970 0.2482524 0.1543779
## [49,] 49 5.082970 0.2482524 0.1543779
## [50,] 50 5.082752 0.2483928 0.1545831
## [51,] 51 5.082445 0.2486374 0.1547566
## [52,] 52 5.082321 0.2483838 0.1545239
## [53,] 53 5.082321 0.2483838 0.1545239
## [54,] 54 5.082251 0.2485717 0.1546836
## [55,] 55 5.082251 0.2485717 0.1546836
## [56,] 56 5.082243 0.2484730 0.1545955
## [57,] 57 5.082235 0.2485377 0.1546450
##
## $fun
## function(par){
##
##
## ## The following lines prevent 'optim_nm' from wandering into negative parameter space
## if (par[1]<0){
## par[1] = -par[1]
## }
## if(par[2]<0){
## par[2] = -par[2]
## }
##
## ## Run the SIR model with the current parameter values
## ## We want the ode output times to match the data
## ## so we set the time vector to be the same as in
## ## mumps_sav
## output_sav_try = ode(
## y=init_sav,
## times = mumps_sav$time_days,
## func = sir_model,
## parms= c(
## beta = par[1],
## gamma = par[2]
## )
## )
##
## output_savtry_df = as.data.frame(output_sav_try) # coerce to a data.frame object
## ## Compute a cumulative I column from the model output
## output_savtry_df$Icum = 259 - output_savtry_df$S
##
## n_obs = nrow(mumps_sav)
##
## ## Create a data frame with observed and model prediction data for Savoonga
## sav_rmse_df = data.frame(
## time = mumps_sav$time_days,
## obs = mumps_sav$cum_cases,
## pred = output_savtry_df$Icum
## )
## # Calculate the RMSE for this prediction
## rmse_try = sqrt((1/n_obs)*(sum((sav_rmse_df$pred - sav_rmse_df$obs)^2)))
##
## return(rmse_try)
##
## }
## <bytecode: 0x0000027a1662eb00>
##
## $start
## [1] 0.6 0.6
##
## $upper
## [1] NA
##
## $lower
## [1] NA
##
## $control
## $control$k
## [1] 2
##
## $control$iterations
## [1] 57
##
##
## attr(,"class")
## [1] "optim_nmsa"
In the above, par gives the parameter values found to minimise fun, function_value gives the minimised value of fun (ie. the RMSE of the best model fit, in this case). control tells us how many times fun was evaluated (a lot fewer than our 1000 points to get a better answer!).
Figure 2.1: The SIR model fit found by Nelder-Mead, shown with the actual data.
Now that we have this optimised value:
In reality, optim_nm may not find the global minimum RMSE, but instead get stuck somewhere where it can’t easily improve. It is therefore important to try several starting points, and also to check that the point found is biologically plausible. There is also uncertainty associated with the optimal point found, mainly because
Because we no longer have to ‘fill’ the parameter space (as we do with the Monte Carlo approach), it is quite simple to apply this method to a model with more parameters.
It turns out that Mumps has a pre-infectious period of around 10-20 days. This means that an SEIR model will be more suitable.
opt_fun_sav to use your SEIR code from the previous workshoppar_init to include the the third parameterThe R library outbreaks contains data for a number of infectious disease outbreaks. One of these is influenza_england_1978_school, which records an outbreak of Flu in an English boarding school. The in_bed column can be used as a proxy for the number infectious.
Some hints:
flu78 = influenza_england_1978_school (technially this doesn’t rename the original object but instead creates a copy with a different name, but with small datasets this doesn’t really matter).flu78$time = as.numeric(flu78$date - min(flu78$date)) + 1As we saw in Lecture 4, for many infectious diseases, a simple SIR or SEIR model is not sufficient. This may be because the dynamics of the disease itself are more complicated, or because the output would not contain the information needed by public health decision makers, or both.
In our lecture we studied RSV as an example, where reinfection is possible, as well as waning immunity. The code for this example is below, with the seasonality function included.
# Define Q function for seasonality
# In this function t is in years
Qfun = function(t, a, b, phi, tau, mu){
top = b*(a*cos(2*pi*(t-phi))+1)
bottom = tau + mu
return(top/bottom)
}
sir_rsv_seas = function (t, y, params)
{
# label state variables from within the y vector
S = y[1] # susceptibles
IS = y[2] # Infectious (from S)
IR = y[3] # infectious (from R)
R = y[4] # recovered (but still infectious, remember)
# Create variable N for total number
N = S + IR + IS + R
# Pull parameter values from the params vector
a = params["a"]
phi = params["phi"]
mu = params["mu"]
alpha = params["alpha"]
tau = params["tau"]
rho = params["rho"]
sigma=params["sigma"]
b = params["b"]
eta = params["eta"]
# Calculate Lambda function, for some value of t and parameter
# values as specified
# Divide t by 365 to go from years to days
lambda = Qfun(t/365, a, b, phi, tau, mu)*(IS + eta*IR)*(tau+mu)
# Rates of change for each compartment
dS = mu*N - lambda*S/N + (alpha*tau/rho)*(IS+IR+R) - mu*S
dIS = lambda*S/N - (tau+mu)*IS
dIR = sigma*lambda*R/N - (tau/rho + mu)*IR
dR = (1-alpha/rho)*tau*IS + (1-alpha)*(tau/rho)*IR -
R*(sigma*lambda/N + alpha*tau/rho + mu)
# return list of gradients
res = c(dS, dIS, dIR, dR)
return(list(res))
}
# Set parameter values
params_rsv_seas = c(
a= 0.5,
phi = 0,
mu=1/(365*70),
alpha = 0.16,
tau = 1/6,
rho = 0.8,
sigma = 0.3,
b = 0.5,
eta = 0.8
)
# Initial values for differential equations
# We will run the model for 2.5 years, to show the seasonal pattern
initial_vals_rsv = c (S = 0.999, IS=0.001, IR = 0, R = 0)
timepoints_rsv_seas = seq (0, 365*2.5, by=0.1)
# Run model and format output
output_rsv_seas = ode (y=initial_vals_rsv, times = timepoints_rsv_seas,
func = sir_rsv_seas, parms= params_rsv_seas)
out_rsv_seas_df = as.data.frame(output_rsv_seas) # coerce to a data.frame object
out_rsv_seas_tidy = pivot_longer(out_rsv_seas_df, cols = c("S", "IS", "IR", "R"),
names_to = "Compartment")
# Create date variable to show progression through year
out_rsv_seas_tidy$Date = as.Date("31-12-2019", format = "%d-%m-%Y") + out_rsv_seas_tidy$time
# Plot compartment populations
ggplot(data=out_rsv_seas_tidy, aes(x=Date, y=value, col=Compartment)) +
geom_path() + xlab("Time (days)") + ylab("Proportion of population") +
scale_x_date(date_labels = "%B")Figure 3.1: Compartment populations in the RSV model, with seasonality. For this plot, we set a=0.5, phi=0.
Legrand et al. (2007) present a compartmental model for Ebola, containing several compartments and parameters we haven’t yet seen. You can find the article by copying the title into Google Scholar.