1 Overview

This lecture will think about linking the SIR model we developed in Lectures 1 and 2 to the real world. In particular, we will cover:

  • Estimating \(R_0\) from early data
  • A Monte Carlo approach to parameter estimation
  • Using optim to calibrate an SIR model

These could easily each be a lecture (or more!) on their own, so please ask questions at any time!

2 Using compartmental models to learn about epidemics

So far we have looked at the development of SIR type models, and some key extensions, but we haven’t yet thought about how to use them in the face of a real epidemic. The model development stages we’ve discussed in the first session do of course link to the real world, since the model structure we choose would be rooted in our understanding of the behaviour of the disease. This may still be a huge source of uncertainty, especially in the face of a new infectious disease; we remember how early on in the Covid-19 pandemic it was unknown whether people retained their immunity after being infected, for example.

The main difficulty, once a model structure has been decided upon, is to estimate the parameter values, and quantities such as the basic reproduction number. There are two main strands to this area:

  • Estimating the parameters in the early stages of an epidemic
  • Estimating the parameters once the epidemic is over - this is still useful, for learning about the properties of the disease.

Although many of the quantities we have mentioned (reproduction numbers, transmission rates etc.) have intuitively simple definitions, they are difficult or even impossible to measure. However, combining the mathematical model with data we can make some estimations.

3 The early stages of an epidemic

Using data on the incidence of an infectious disease during the early stages of an epidemic, we can estimate the basic reproduction number, \(R_0\). This is useful as it gives an indicator of the transmissibility of the disease and the herd immunity threshold. To estimate \(R_0\), we first need to calculate the growth rate.

3.1 The growth rate of an epidemic

For more detail on this section, see Vynnycky and White (2010).

During the first stages of an epidemic, the rate of increase in infectious individuals should be approximately constant. To see why this is, we first think of a simple differential equation \[ \frac{dQ\left(t\right)}{dt} = -kQ\left(t\right), \] where \(Q\left(t\right)\) is some function of time that decreases at constant rate \(k\). The solution to this equation (ie. an equation for the actual function \(Q\left(t\right)\)) is \[ Q\left(t\right) = Q\left(0\right)e^{-kt}. \] Since the derivative of \(e^{-kt}\) w.r.t \(t\) is \(-ke^{-kt}\), we can see that \[ \frac{dQ\left(t\right)}{dt} = -kQ\left(0\right)e^{-kt} = -kQ\left(t\right), \] and that this function therefore solves the differential equation.

The differential equation for the number of Infectious individuals is (in a simple SIR model) \[ \frac{dI}{dt} = \frac{\beta I S}{N} - \gamma I. \] At the start of the epidemic, we have \(S\approx{N}\), and so approximately \[ \frac{dI\left(t\right)}{dt} = \beta I\left(t\right) - \gamma I\left(t\right) = \left(\beta - \gamma\right) I\left(t\right), \] and therefore an approximate solution is \[ I\left(t\right) \approx I\left(0\right)e^{\left(\gamma-\beta\right)t} \] where \(I\left(0\right)\) is the number of infectious individuals at time 0. More generally (ie. not assuming the simplest SIR model), we write

\[ I\left(t\right) \approx I\left(0\right)e^{\Lambda t}, \] where \(\Lambda\) is the growth rate. Taking logs, we therefore have \[ \ln\left(I\left(t\right)\right) \approx \ln\left(I\left(0\right)\right) + \Lambda t. \] This is a straight line, with gradient \(\Lambda\). Therefore, if we plot the natural logarithm of the number of infectious individuals against time, we should see a straight line with gradient \(\Lambda\).

This leads to an obvious question, which is “What if we don’t know the number of infectious individuals at each time step?”. Thankfully, at log scale, the cumulative case increases at a similar rate, as shown in Figure 3.1.

Natural logarithms of the SIR model populations, as well as the cumulative sum of Infectious individuals.

Figure 3.1: Natural logarithms of the SIR model populations, as well as the cumulative sum of Infectious individuals.

3.1.1 Scale invariance

It is often the case that incidences of an infectious disease are under-reported. Sometimes this is included using some model parameter to represent the proportion of cases that are reported, say \(k_r\), with \(0<k_r<1\). In this case,

\[I_{reported} = k_r I_{true} \] and therefore

\[I_{true} = \frac{I_{reported}}{k_r},\] where \(I_{true}\) is the true number of infectious cases, and \(I_{reported}\) the number that have been reported. This under-reporting can especially be the case at the start of an epidemic, but thankfully our estimate of the growth-rate \(\Lambda\) is unaffected. This is because \[ \begin{align*} I_{true}\left(t\right) & \approx I_{true}\left(0\right)e^{\Lambda t}\\ & \approx \frac{1}{k_r}I_{reported}\left(0\right) e^{\Lambda t}, \end{align*} \] and therefore

\[\begin{align*} \ln \left(I_{true}\left(t\right)\right) & \approx \ln{\left[\frac{1}{k_r}I_{reported}\left(0\right) e^{\Lambda t}\right]}\\ &\approx{\ln\left(\frac{1}{k_r}I_{reported}\left(0\right)\right) + \Lambda t}. \end{align*}\]

This means that the intercept of the straight line changes, but not the gradient (which is the part we are interested in).

3.1.2 Example: Covid-19 in the UK

From the UK government website you can download daily case data for Covid-19 in the UK. This contains both new daily cases and cumulative counts, for each of England, Scotland, Wales and Northern Ireland.

Plotting the time series for 2020 and 2021 we see the trends in Figures 3.2 and 3.3.

Daily new Covid-19 cases in the UK from early 2020 to December 2021

Figure 3.2: Daily new Covid-19 cases in the UK from early 2020 to December 2021

Cumulative Covid-19 cases in the UK from early 2020 to December 2021

Figure 3.3: Cumulative Covid-19 cases in the UK from early 2020 to December 2021

We can take logs of the cumulative count and focus on just the first month, and this is shown in Figure 3.4.

Natural logarithms of cumulative Covid-19 cases in the UK in early 2020.

Figure 3.4: Natural logarithms of cumulative Covid-19 cases in the UK in early 2020.

We can fit straight lines to each of these using ordinary least squares, as in Figure 3.5.

Natural logarithms of cumulative Covid-19 cases in the UK in early 2020.

Figure 3.5: Natural logarithms of cumulative Covid-19 cases in the UK in early 2020.

The gradients and their standard errors are given in Table 3.1.

Table 3.1: Coefficients from the fitted linear models.
Region Coefficients SE
England 0.224 0.00461
Northern Ireland 0.194 0.00823
Scotland 0.229 0.00940
Wales 0.277 0.00992

The ‘Coefficients’ column in Table 3.1 give an estimate of the growth rate \(\Lambda\) for each UK region. These values are all fairly similar, and it is quite plausible that factors such as the different population densities in the regions would affect the growth rate of Covid 19.

3.2 Using the growth rate to estimate \(R_0\)

There are many ways to estimate \(R_0\) from \(\Lambda\), each relying on different model assumptions and using different summaries of the disease’s behaviour. For example, many of the simpler estimates assume that the pre-infectious and/or infectious periods follow an exponential distribution (which we also do when we use rate parameters).

One such estimate is

\[ \begin{equation} R_0 \approx 1 + \Lambda T_S \tag{3.1} \end{equation} \]

where \(T_S\) is the serial interval (the interval between successive cases). In the early stages this can be estimated using data from cases where the chain of infection is almost certainly known.

Another is

\[\begin{equation} R_0\approx \left(1+\Lambda D\right)\left(1+\Lambda D'\right),\tag{3.2} \end{equation}\] where \(D\) and \(D'\) are the durations of the infectious and pre-infectious (latent) periods, respectively.

3.2.1 Example continued: Covid-19 in the UK

Nishiura, Linton, and Akhmetzhanov (2020) estimate the serial interval of Covid-19 to be 4.6 (using their most certain primary and secondary case pairs), with a 95% credible interval of \(\left(3.5, 5.9\right)\). Using Equation (3.1), we can therefore use our estimates of \(\Lambda\) from Table 3.1 with these estimates to produce estimates of \(R_0\). These are shown in Table 3.2. Notice that this paper was published in April 2020, and that these estimates were therefore available relatively early on in the pandemic.

Table 3.2: Estimates of R[0] using the median and 95% credible interval from Nishiura, Linton, and Akhmetzhanov (2020).
Region Coefficients SE R0_lower95 R0_median R0_upper95
England 0.224 0.00461 1.7840 2.0304 2.3216
Northern Ireland 0.194 0.00823 1.6790 1.8924 2.1446
Scotland 0.229 0.00940 1.8015 2.0534 2.3511
Wales 0.277 0.00992 1.9695 2.2742 2.6343

At this point, we haven’t made use of the uncertainty in the estimates of \(\Lambda\), but this would also widen the credible intervals.

4 Estimating parameters by model fitting

We can also use data from an outbreak of an infectious disease to find appropriate values for the parameters of an SIR-type model. We call this calibration (this is a general model-fitting term, not just for SIR-type or compartmental models). If a model has \(n\) parameters, we can think of them as forming an \(n\)-dimensional space. We can call this space the parameter space. Each point in the space will correspond to a particular ‘setting’ of the model (ie. to a particular set of specific parameter values). In general this can be difficult to visualise, as \(n\) can be quite large, but for 2 or 3 parameters we can easily see points in the space.

4.1 SIR parameters in parameter space

To look into this, we will return to the simplest model type we looked at in lecture 1, the SIR model with no vital dynamics. This model involves two parameters: the per capita transmission rate \(\beta\) and the recovery rate \(\gamma\). In model fitting generally, we talk about the parameter space (or sometimes input space), and in this case where there are two parameters, this is a 2D surface. If we set upper and lower limits on the value of each parameter, we would have a rectangle. Because there are only two parameters, we can plot any pair of values. For instance, in our example we had \(\beta=1.25,\,\gamma=0.5\), and we can see this point in parameter spcae in Figure 4.1.

The parameter values from our example, plotted in 2D parameter space.

Figure 4.1: The parameter values from our example, plotted in 2D parameter space.

This led to the model output shown in Figure 4.2

The model output corresponding to the parameter values from this example.

Figure 4.2: The model output corresponding to the parameter values from this example.

4.2 Moving around parameter space

We can now visualise how the model output varies as we move around the parameter space. Choosing points effectively to maximise the information gained is an important area of statistics (and science generally) known as experimental design.

There are many (infinitely many!) ways we could arrange n points in 2-dimensional space.

10 points arranged such that alpha = gamma.

Figure 4.3: 10 points arranged such that alpha = gamma.

In Figure 4.3 10 points are arranged in parameter space such that \(\alpha=\gamma\). With this design, we will learn about what happens when the parameters are varied together, but not about what will happen if the points are varied independently.

A full factorial design.

Figure 4.4: A full factorial design.

In Figure 4.4 we see what is called a full-factorial design with nine points: every combination of each of three values for both parameters is present. This is an improvement, but if one of the parameters turns out to have little effect on the output (as is not uncommon with real models), our design is effectively reduced to three points.

Instead, we will use a latin-hypercube design (LHD). To make an LHD of size \(p\), the parameter space is divided into a \(p\times{p}\) grid, and there must be one point in each row and each column. This in fact means that the design in Figure 4.3 is a latin hypercube, so we will add a further constraint. A maximin LHD is an LHD where the smallest distance between two points is maximised. Now we have \(p\) distinct values for each parameter, and the space is, in some sense, being filled. This is a bit of a segue into experimental design, but if you want to learn more about it the original paper is Morris and Mitchell (1995).

We can generate a 10 point maximin LHD in \(\beta\) and \(\gamma\) automatically in R, using various packages (eg. lhs).

10 randomly chosen points in parameter space

Figure 4.5: 10 randomly chosen points in parameter space

Each of these points corresponds to specific values for \(\beta\) and \(\gamma\), and so for each point we can run the SIR model to produce output like that shown in Figure 4.2. To demonstrate the correspondance between point and output, Figure 4.6 shows just the \(I\) compartment, coloured to correspond to the colours in Figure 4.5. This is a Monte Carlo approach, where we use random sampling to generate results.

Infectious populations corresponding to the 10 randomly chosen points in parameter space

Figure 4.6: Infectious populations corresponding to the 10 randomly chosen points in parameter space

We can see that varying the values of \(\beta\) and \(\gamma\) leads to very different trajectories. We will employ this technique to narrow down the parameter space to suitable values given some data.

4.3 Example: Mumps in a virgin population

Philip, Reinhard, and Lackman (1959) reported on two outbreaks of Mumps in 1957, on St. Lawrence Island in the Bering Sea. This island is 40 miles from Alaska and 120 miles from Siberia (see Figure 4.7), and at the time was visited very infrequently from either mainland. There are two villages on St Lawrence Island, Gambell (1957 population 302) and Savoonga (1957 population 259). No prior outbreak of mumps was known there, and there was no word in the native language to convey the idea of mumps. Since mumps is an immunising infection, and the populations of these villages each started as entirely (or very nearly entirely) susceptible, this dataset should be a good candidate to calibrate the parameters of an SIR model.

St. Lawrence Island

Figure 4.7: St. Lawrence Island

The data points given are weekly counts of new incidences. Because of this, we don’t have the actual number of infectious people at any time, which is what the model’s I compartment gives us. We therefore use the observed data to compute the cumulative sum of incidences, and derive a corresponding value from the model.

In this model, as people leave the \(S\) compartment they are newly infected. As time progresses, more and more people leave the \(S\) compartment, passing through the \(I\) compartment into \(R\). These all count in a measure of cumulative infections. The only people who don’t count in the cumulative total of infections are those still in \(S\). Therefore at times \(t\), the cumulative total of infections is \[I^{cum}_t = N - S_{t}.\] We can use this value along with the recorded cumulative infection data to calibrate the SIR model to this disease outbreak.

In this case it is simple to derive an analogous quantity from the model, but in cases with more compartments, or with re-infection, it can be more complicated.

In this lecture we will look at the epidemic in Savoonga.

Cumulative case count for Savoonga, 1957.

Figure 4.8: Cumulative case count for Savoonga, 1957.

Our goal is to find values for \(\beta\) and \(\gamma\) that produce an \(I^{cum}_t\) curve that is as close as possible to the actual data in Figure 4.8.

4.3.1 Goodness of fit

To compare candidate parameter values, we need some measure of the goodness-of-fit of a model curve to the data. The one we will use is the root mean squared error, RMSE.

Definition 4.1 The root mean squared error is defined as \[ RMSE = \sqrt{\frac{1}{n}\sum\limits_{i=1}^{n}{\left(\hat{y}_i - y_i\right)^2}}. \] The \(y_i\), for \(1\leq{i}\leq{n}\), are the observed values, and \(n\) is therefore the number of observed values. The \(\hat{y}_i\) are the corresponding model predictions. As its name suggests, the RMSE gives the square root of the mean of the squared errors between model and data, and is on the same scale as the data. The smaller the RMSE, the better the fit of the model to the data.

We can demonstrate this by choosing some values for \(\beta\) and \(\gamma\), say \(\beta=1.1,\,\gamma=0.75\), and comparing the resulting model curve to the Savoonga data.

Figure 4.9 shows the model’s \(I^{cum}_t\) population curve in black, with red lines to show the distance between each observed value and the model prediction. These are the \(\left(\hat{y}_i - y_i\right)\) in the RMSE definition. This particular pair of parameter values appears to give a fairly poor fit, with RMSE = 54.71, rather large in relation to the observed values.

Cumulative case count for Savoonga, 1957 (blue), with the model produced by beta=1.1, gamma=0.75 (black), and the distance between each pair of observed and predicted values marked in red.

Figure 4.9: Cumulative case count for Savoonga, 1957 (blue), with the model produced by beta=1.1, gamma=0.75 (black), and the distance between each pair of observed and predicted values marked in red.

4.3.2 Exploring the space

By generating a selection of parameter values that cover the space well, we can get some idea as to which regions of parameter space might give a better fit. Figure 4.10 shows 1000 different points in our SIR model’s parameter space. For each one, the model has been generated and the output (specifically \(I^{cum}_t\)) compared to the Savoonga data. For each point, we can therefore calculate an RMSE value, and this is shown by the colour. We can see that the lowest RMSE is now much lower than that in Figure 4.9.

## [1] "Smallest RMSE is 12.4548 at beta=0.26039, gamma=0.155666"
1000 points in SIR parameter space, coloured by the RMSE as a measure of fit to the Savoonga mumps data.

Figure 4.10: 1000 points in SIR parameter space, coloured by the RMSE as a measure of fit to the Savoonga mumps data.

Notice that the best pairs of values for \(\beta\) and \(\gamma\) approximately lie on a straight line through \(\beta=0,\,\gamma=0\), representing the same \(R_0\) value (remember that for this model \(R_0= \frac{\beta}{\gamma}\)). We can plot the \(I^{cum}_t\) output for the SIR model with these parameters, to see the fit.

Cumulative case count for Savoonga, 1957 (blue), with the model produced by best pair of parameter values found.

Figure 4.11: Cumulative case count for Savoonga, 1957 (blue), with the model produced by best pair of parameter values found.

This is still not the best fit - it is the best from the 1,000 candidate models we generated. From the plot we can see that most of the region can be ruled out, and we can ‘zoom in’ on the region that seems most plausible.

4.3.3 Focussing in

Based on the best value from our coarse experiment in Figure 4.11, we will refocus on the region \(\beta \in [0.23,\,0.3],\;\gamma \in [0.125,\,0.2]\).

## [1] "Smallest RMSE is 5.14844 at beta=0.24675, gamma=0.153755"
1000 points in zoomed-in SIR parameter space, coloured by the RMSE as a measure of fit to the Savoonga mumps data.

Figure 4.12: 1000 points in zoomed-in SIR parameter space, coloured by the RMSE as a measure of fit to the Savoonga mumps data.

We can plot the \(I^{cum}_t\) output for the SIR model with these parameters, to see the fit.

Cumulative case count for Savoonga, 1957 (blue), with the model produced by best pair of parameter values found in the refocussed region.

Figure 4.13: Cumulative case count for Savoonga, 1957 (blue), with the model produced by best pair of parameter values found in the refocussed region.

If we wanted to, we could repeat this process to gain an even better fit.

4.4 Some caveats!

This process is subject to many uncertainties, errors and problems. Here are a few:

  • The ‘best’ parameter values are only equivalent to the true system values if the model is perfect - ie. never.
  • The process depends on the ranges chosen and the random values generated for the input design.
  • If the best found value lies very close to the boundary, it’s possible that a better value lies outside the chosen region.
  • The choice of final time point is somewhat arbitrary - in our example we have approximately 65 days in which there are no new infections recorded, but we could have had 20 or 100. This would change the RMSE values and therefore the ‘best’ point.

5 The Optimisation approach

Instead of evaluating a function (in this case the SIR model and RMSE calculation) hundreds or thousands of times, we can use a step-by-step approach, employing some sort of strategy to select our next points to run. There are many different optimisation algorithms, but the one we will use is Nelder-Mead. This is rarely the fastest method, but it is relatively robust and intuitive, and doesn’t require information about the gradient of the function, which many do.

5.1 Optimisation with Nelder-Mead - a very quick introduction!

Optimisation methods require a function

\[ f: \mathbb{R}^n \rightarrow{\mathbb{R}}.\] That is, a function whose input can (and usually will) be multivariate, but whose output is univariate - we need a single valued summary of each parameter point (measuring ‘how good it is’ in some sense) so that we can arrive at the optimal point. In the case of our SIR mumps example, the input space is \(\left(\beta,\,\gamma\right)\) and the output space is the RMSE value of the SIR output compared to the observed infection data.

Nelder-Mead optimisation starts with a randomly generated n-simplex in \(\mathbb{R^n}\). A simplex is the simplest possible shape spanning all dimensions, and has \(n+1\) vertices. A 2-simplex (ie. in \(\mathbb{R^2}\)) is a triangle. A 3-simplex is a tetrahedron, and so on.

The function is evaluated at each vertex, and the worst (\(x_h\), the maximum), second worst (\(x_s\)) and best (\(x_l\), the minimum) vertices are labelled. The following changes are proposed in order:

  • Reflect: The worst point (ie. that with the highest value of \(f\)) is reflected through the midpoint of the line between the second worst and best points, generating a new point \(x_r\). If \(f\left(x_l\right)\leq{f\left(x_r\right)}<f\left(x_s\right)\), accept \(x_r\).
Reflection

Figure 5.1: Reflection

  • Expand: If \(f\left(x_r\right) < f\left(x_l\right)\), extend \(x_r\) for twice the distance to \(x_e\).
    Expansion

    Figure 5.2: Expansion

  • Contract: if \(f\left(x_r\right) \geq{f\left(x_s\right)}\), find the contraction point \(x_c\) using the better of the points \(x_r\) (outside) and \(x_e\) (inside).
    Contraction

    Figure 5.3: Contraction

  • Shrink: Compute \(n\) new vertices that shrink the simplex towards \(x_l\)
    Shrinking

    Figure 5.4: Shrinking

5.1.1 A stopping criterion

With the four instructions above, the algorithm would continue indefinitely; it needs to know when to stop and return the ‘optimal’ point. In general the Nelder-Mead algorithm terminates when either

  • The simplex is small enough / the vertices are close enough together - in which case it returns the current best point.
  • Some maximum number of iterations has been reached - in which case it reports that is has been unable to find an optimal point.

This video demonstrates how the simplex moves around the space until it finds the optimal point.

5.2 Optim: Mumps example

Continuing with our mumps example, we can use optim to find the point in parameter yielding the smallest value of RMSE (up to some tolerence). Figure 5.5 shows the RMSE of the same region as before by colour, with the sequence of points resulting from the Nelder-Mead algorithm shown in orange. The starting point is \(\beta=0.6,\,\gamma=0.6\). The number of function iterations reported by the optimisation function is 57 - this means it is much more efficient than the Monte Carlo method.

The progression of optim through parameter space, starting at (0.6, 0.6).

Figure 5.5: The progression of optim through parameter space, starting at (0.6, 0.6).

This point found by Nelder-Mead is slightly better than the result using the zoomed in Latin hypercube method, since it is not constrained by a set of randomly generated points. Like before, we can plot the fitted cumulative \(I\) curve agains the mumps outbreak data from Savoonga (See Figure 5.6.

The SIR model fit found by Nelder-Mead, shown with the actual data.

Figure 5.6: The SIR model fit found by Nelder-Mead, shown with the actual data.

5.2.1 Optimisation - some thoughts

  • When it works, Nelder-Mead is much more efficient than Monte Carlo (57 evaluations compared to 1,000)
    • This benefit is increased in higher dimensions
  • Nelder-Mead isn’t constrained to a specific region of parameter space
  • Nelder-Mead is less robust than Monte Carlo - you can see it gets stuck for a while on the ridge - and there is no guarantee it will end up in the best place.

6 Summary

In this lecture we have thought about using observed data with SIR-type models. In particular

  • In the early stages of an epidemic (or using data from the early stages only) we can estimate the growth rate \(\Lambda\), and use this to estimate the basic reproduction number \(R_0\).

  • We can calibrate the parameters of an SIR model to observed data once an epidemic has passed. We have done this in two ways:

    1. Using a latin hypercube design in SIR parameter space.
    2. Using an optimisation algorithm

In the next workshop you will be able to explore these methods in more detail.

References

Morris, Max D, and Toby J Mitchell. 1995. “Exploratory Designs for Computational Experiments.” Journal of Statistical Planning and Inference 43 (3): 381–402.
Nishiura, Hiroshi, Natalie M Linton, and Andrei R Akhmetzhanov. 2020. “Serial Interval of Novel Coronavirus (COVID-19) Infections.” International Journal of Infectious Diseases 93: 284–86.
Philip, BN, Karl R Reinhard, and DB Lackman. 1959. “Observations on a Mumps Epidemic in a "Virgin" Population.” American Journal of Epidemiology 69 (2): 91–111.
Vynnycky, E., and R. White. 2010. An Introduction to Infectious Disease Modelling. OUP Oxford. https://books.google.co.uk/books?id=8iVz3NHLnYUC.