Function values	$\beta$	$\theta$	$\sigma_u^2$	$z$	$\epsilon$

Emulator and implausibility plots for the given data. The emulator is given by the blue line, with associated uncertainty ( $\pm 3\sigma$ ) given by the red bounding lines. The desired output $z$ is the horizontal black line, with its $3\sigma$ uncertainty given by the corresponding dotted lines. Observed values are the grey dots. The colour bar at the bottom of the emulator plot gives the implausibility with cutoff $I=3$ .

The implausibility is also plotted: the non-implausible region $I< 3$ is denoted in green.

Emulation and History Matching

The system here is a one-dimensional toy model that demonstrates the various facets of Bayes Linear emulation and history matching (the general case, and background, is discussed below). The starting data is ten points taken uniformly from the values of $f(x)=3 x\sin \left(\frac{5\pi(x-0.1)}{0.4}\right).$ The $z$ value we want to find non-implausible inputs $x$ for is $z=-0.6$ , with standard deviation (observational uncertainty) $\epsilon=0.02$ . The starting values for the emulator parameters are $\beta=0,\,\theta_c=0.06,\,\sigma_u^2=0.6$ , where the emulator function is assumed to be have constant mean: $g(x)=\beta+u(x)$ . History matching this will take an additional sample from each of the non-implausible regions, obtain the function value, and re-train the emulator over these regions. Note: history matching with another function in this page is not possible, since we would then have to provide additional points at every stage by hand, but the first stage of emulation can be found by changing the function values.

Emulation

In many fields, we have complex systems that we wish to analyse or otherwise obtain information from. Examples include:

In the majority of such systems, we must build a computer simulation from a model of the system rather than being able to directly analyse it. There are a few drawbacks to this approach. Firstly, the simulation is often very complex and requires a large amount of computational time to evaluate; if we have a large input space over which we would like to test the simulation, this will take a long time. Secondly, we cannot be sure without repeated testing that our model is actually accurate; even if we do spend a long time running it, we cannot be sure that it is correct. Finally, most models present their outputs as being uncertainty-free: we have a function

$f(x_1,x_2,\dots)$ that associates to each set of inputs

$x_1,\,x_2,\,\dots$ a single value

$z$ . This is not always an accurate reflection of the system, where there can be some stochasticity in the output of the system.

Emulation accounts for these issues. Suppose we have a model representing a physical process with $n$ inputs $x_1,\,x_2,\dots,x_n$ and $p$ outputs $z_1,\,z_2,\dots,z_n$ . We have, therefore, the following system: $f(\mathbf{x})=\mathbf{z}+\mathbf{\epsilon}$ where the vector $\mathbf{\epsilon}=(\epsilon_1,\epsilon_2,\dots,\epsilon_p)$ represents observational uncertainty. Then for each of the $p$ outputs we may define the emulator for the $j$ -th output to be $g_{j}(\mathbf{x})=\sum_{i=1}^{n}h_{ij}(\mathbf{x})\beta_{ij}+\mathbf{u}(\mathbf{x}).$ The functions $h(\mathbf{x})$ are any functions of the inputs $\mathbf{x}$ ; the constants $\beta$ are the relevant weightings of the functions (henceforth called regression coefficients); the function $\mathbf{u}(\mathbf{x})$ is a correlation function which encodes the behaviour of the emulator. Typically, we take the functions $h(\mathbf{x})$ to be polynomials in terms of the inputs: therefore the emulator has the form of a weighted sum of polynomials. The correlation function $\mathbf{u}$ can have a multitude of forms: it determines how strongly correlated the position of points are to those nearby. Often a Gaussian process is used, in which case we need a function $u$ such that the correlation between two points $\mathbf{x}$ and $\mathbf{x}^\prime$ is given by $\text{Cov}[u(\mathbf{x}),u(\mathbf{x}^\prime)]=\sigma_u^2\exp\left(\sum_{i=1}^{n}-\frac{|x_i-x_i^\prime|^2}{\theta_i^2}\right).$ Other, more complicated, representations of both the emulator $h(\mathbf{x})$ and the correlation structure are possible.

With this structure in place, we can apply it to the actual simulation as follows. Sample a set of points $y$ from the full model, to get paired data $(D,E[D])\equiv[(y_1,y_2,\dots),(f(y_1),f(y_2),\dots)]$ . Either choose the parameters for the emulator by hand, or use the sample to train the emulator (for example, using maximum likelihood estimation). Then for any unevaluated point in the input space, $x$ , we have Bayes Linear adjustment $E_D[g(x)]=E[g(x)]+\text{Cov}[g(x),D]\text{Var}(D)^{-1}(D-E[D]),$ $\text{Var}_D[g(x)]=\text{Var}[g(x)]-\text{Cov}[g(x),D]\text{Var}(D)^{-1}\text{Cov}[D,g(x)].$ This adjusts the value of the emulator and the associated uncertainty at any required point, given the data $D$ already known about the model. For any required point, we need only calculate a matrix inverse $\text{Var}(D)^{-1}$ (and in fact the inverse is the same for any point), so is much faster than putting more values into the simulation. It also gives us a measure of uncertainty at every point, rather than presenting the result as being necessarily "the" answer.

Implausibility and History Matching

Given an emulator, we often want to find values of the input parameters that can give a desired output. We therefore define an 'implausibility' measure which determines whether a point could conceivably result in the given output: for a point $x$ , we have $I(x)^2=\frac{(E[g(x)]-z)^2}{\text{Var}(g(x))-\delta^2}$ where $\delta$ encodes any uncertainty in the system (for example, the observational error $\epsilon$ ). By imposing a cut-off for the allowed size of $I$ (for example, requiring $I< 3$ is roughly equivalent to allowing $g(x)$ to be up to three standard deviations away from $z$ ) we can reduce the size of the input space. Having done this, we can sample some more points from the full simulation model and create a new emulator given this augmented set of points: this will reduce the variability and make the emulator more accurate in the regions of interest. This procedure is known as history matching, and each application of the above is one wave of the match.