Bayesian Analysis of Multilevel Complex Models of Physical Systems
Supervisor: Ian Vernon | Research Area: Statistics
Background
Many major scientific disciplines now employ detailed mathematical models to describe complex physical systems of interest, for example, galaxy formation models are used to understand structure formation in our universe, climate models are used to study and predict global warming, UK energy distribution models are used to plan to ensure the provision of sufficient UK power supply and epidemiology models are used to predict and control the development of epidemics. However, to use such models for understanding, prediction and subsequent decision making, a full (Bayesian) uncertainty analysis should be performed, a process now referred to as ``Uncertainty Quantification".
However, many of these scientific models are complex, take significant time to evaluate and have several unknown input parameters. The large evaluation time in particular precludes the use of standard Bayesian approaches for parameter inference, system prediction and decision support. A solution to this problem is to construct a Bayesian emulator: a powerful Bayesian statistical construct that mimics the slow scientific model but which is often several orders of magnitude faster to evaluate (now very popular across statistics and machine learning). Emulators can then be used to perform all the required Bayesian calculations.
Often, there exist two (or more) models of the physical system, one fast but inaccurate, and the other slow but more precise. Using the fast model to aid construction of the emulator of the slow model is a very powerful and widely applicable approach, which will form the basis of this project.
An example of the output of a complex model of galaxy formation (known as the EAGLE model) used to understand the evolution of our universe. You will learn to construct emulators for models like this.
Group project
The group project will involve understanding core techniques in emulator construction, centred around Bayesian regression, and their application to a two-level complex model of a physical system (e.g. disease models, galaxy formation models etc.).
By the end of the group project, you will have knowledge about:
- multivariate Bayesian regression as a core component of emulation;
- the impact of prior specification for Bayesian regression;
- linking two regression structures to perform multilevel emulation;
- initial investigations into a new complex model of a physical system;
- visualisations to extract insight about the real system of interest.
By the end of this group project, you will be able to:
- construct basic regression-based emulators to mimic expensive complex models;
- construct two-level regression-based emulators for fast and slow models;
- perform diagnostics to demonstrate emulator accuracy;
- interpret a range of visualisation to gain insight into the system;
- apply these methods to a real system of interest.
Mode of operation and evidence of learning
The project will be based on reading suitable material, some statistical paper-and-pencil derivations, and some programming tasks using R (for both calculation and visualisations).
The project will have a (hopefully) pleasant balance between learning new statistical techniques, coding them and using them to investigate the physical model of interest.
An example of an emulator used to perform optimisation. You will learn to build and employ emulators like this.
Individual project
The individual project will build on the knowledge we have gained in the group project and will explore additional advanced topics. Examples of directions we can investigate are:
- Full emulation structures: going beyond the simple regression based structures used in the group project
- Design considerations: what are the best locations for evaluations of the fast and slow complex models
- Optimisation/Decision support: how to proceed if we wish to use the complex model to aid decision making
- History matching/parameter inference: matching the complex model to observations of the real system
- Incorporating further physical insight into the emulator structure
Mode of operation and evidence of learning
Adding to the corresponding section from the Group project, the Individual project will involve at least one of
- more substantial theoretical work (derivations, literature work);
- more advanced programming, including the building of more advanced emulators;
- more detailed application of emulation to complex models of interest, to achieved specified goals.
Prerequisites and Co-requisites
- Prereqisites: Statistical Inference II, Data Science and Statistical Modelling II.
- Co-requisites: None.
Additional information
If you would like more information about this project, please contact me at
i.r.vernon@durham.ac.uk.
Resources
For an introduction to emulation as applied to a complex model of Galaxy Formation see our paper entitled
"Galaxy Formation: Bayesian History Matching for the Observable Universe" which can be found at
Statistical Science.
For more of a tutorial in building emulators see
"Bayesian uncertainty analysis for complex systems biology models: emulation, global parameter searches and evaluation of gene functions", which can be found
here, although note that in the group project we will be focussing on the regression part of the emulator, possibly extending to the full emulator in the individual project.
A tutorial in some of the concepts around using emulators for optimisation can be found in the first half of
"A Tutorial on Bayesian Optimisation" although don't worry about some of the more technical aspects: we will do things in a more efficient (and simpler!) way.