Monte Carlo and Bayesian Computation
Supervisors: J.A. Cumming & L.J.M. Aslett
Computational techniques based on simulation have now become an essential part of the statistician's toolbox, providing efficient and practical algorithms to solve a wide range of scientific and engineering problems. Basic Monte Carlo methods provide computational tools for the generation of random numbers from both standard and nonstandard distributions. However, its most common application is in a stochastic approach to numerical integration, in which the function to be integrated is sampled at random points in order to estimate the value of the integral. In higher dimensions, the principles of Monte Carlo methods are combined with Markov Chains to substantially improve the efficiency. These Markov Chain Monte Carlo (MCMC) methods perform a 'random walk' through the higher dimensional space, spending more time in the important areas associated with high values (probability), ensuring that they explore regions with important contributions to the integral and thus producing reliable estimates.
These computational integration methods have had a profound impact on Bayesian statistics, where most calculations would require some form of integration (finding probabilities from pdfs, marginalising probability distributions, calculating expectations, etc). Prior to the availability of modern computation, Bayesian analysis was restricted to small toy problems or extensive use of conjugate distributions to ensure tractability. However, modern computing enabled Monte Carlo and Markov Chain Monte Carlo methods to profoundly affect the practice of Bayesian statistics by allowing far more intricate and realistic models to be developed and applied across a diverse range of disciplines.
Illustration of the first four samples (left) and 1000 samples (right) from a MCMC Gibbs sampler applied to a 2-D Gaussian density.
As an example of how these methods are applied in practical Bayesian analysis, we consider the 1986 Challenger space shuttle disaster. The explosion was found to be the failure of a component which is believed to be caused by the unusually cold temperature (-0.5°C) at the time of launch. Study of 23 components over a temperatures range from 12°C to 27°C found the majority of failures occurred at the lower temperatures. The response is treated as binary (component failure or not) and a logistic regression model is constructed to model the probability of failure as a function of temperature, x:
.
After specifying some simple prior distributions for α and β, we can use MCMC to investigate the posterior distribution of α and β given the data, and to investigate the probability of failure. We consider two temperature values, namely, 18°C and 7°C, both still considerably warmer than the day of the Challenger launch. At the warmer temperature, the component failure probability is near 0.5; however, at the cooler temperature, the failure probability is almost at 1! Assuming the model extrapolates correctly, one would infer that at the Challenger launch temperature, the failure probability would be surely 1.
Posterior for α and β (left); posterior failure probability at 18°C (centre); posterior failure probability at 7°C (right); .
The project will begin by covering the basics of Monte Carlo and MCMC methods and their role in modern computational statistics, in particular Metropolis-Hastings and Gibbs sampling. Subsequently, the project could develop in a variety of directions:
- the investigation, implementation, and comparison of different MCMC methods
- the study of Bayesian methods and computation within a particular class of statistical problems, e.g. Bayesian approaches to regression
- the exploration and analysis of a particular application or data set of interest
- the in-depth study of more sophisticated MCMC methods and their properties, e.g. reversible jump, adaptive MCMC, Hamiltonian MCMC.
Pre-requisites and Co-requisites
Statistical Concepts II, and Statistical Methods III.
Familiarity with the statistical language R (or alternatively Python), statistical concepts and models, and data analysis are essential.
Optional/Would be helpful
Monte Carlo II; Bayesian Statistics III/IV.
Background & References
Andrieu, C., De Freitas, N., Doucet, A. and Jordan, M.I., 2003. An introduction to MCMC for machine learning. Machine learning, 50(1), pp.5-43. DOI: 10.1023/A:1020281327116
Robert, C.P., and Casella, G., 2010. Introducing Monte Carlo Methods with R. See also slides based on the book.
Robert, C.P., and Casella, G., 2004. Monte Carlo statistical methods. Durham Library
Brooks, S. (ed.), 2001. Handbook of Markov chain Monte Carlo. Durham Library