Durham University Statistics and Probability Group
Durham University Statistics and Probability


Welcome to the Stats4Grads website! Here you will find all the information about the seminar series.

Stats4Grads is a weekly seminar in statistics organised by and aimed at postgraduate students. The seminars take place on Wednesdays between 13:00-14:00, usually in CM105, with tea, coffee and biscuits provided by the Department of Mathematical Sciences.

Stats4Grads is a great opportunity to learn about the research of other postgraduate students and their use of statistics. This includes recent developments in statistics as well as applications to "real-world" problems and cross-disciplinary work. Moreover, Stats4Grads provides a relaxed forum in which to discuss and develop ideas, exchange knowledge as well as access help and insight from students who have a deeper understanding of the theory and methodology.

Feel free to invite a friend or collaborator from another institution or department to give a talk if they are in Durham!

Organiser: Jonathan Owen. For information or to give a talk contact: jonathan.owen@durham.ac.uk .

For details of previous years' seminars, click here.

Stats4Grads Timetable 2018/2019

Systematic Uncertainty Reduction for Petroleum Reservoirs Combining Reservoir Simulation and Bayesian Emulation Techniques

Speaker: Helena Nandi Formentin, Department of Mathematical Sciences, Durham University
Wednesday 22 May 2019: 1pm, CM105


Reservoir simulation models incorporate physical laws, reservoir characteristics and production strategies. They represent our understanding of sub-surface structures based on the available information. Emulators are statistical representations of simulation models behaviour, offering fast evaluations of a sufficiently large number of reservoir scenarios, to enable a full uncertainty analysis. Bayesian History Matching (BHM) aims to find the range of reservoir scenarios that are consistent with the historical data, in order to provide comprehensive evaluation of reservoir performance and consistent, unbiased predictions incorporating realistic levels of uncertainty, required for full asset management. We present a systematic approach for uncertainty quantification that combines reservoir simulation and emulation techniques within a coherent Bayesian framework for uncertainty quantification.

Modelling of Ordinal Data

Speaker: Dr Maria Kateri, Institute of Statistics, RWTH Aachen University, Germany
Wednesday 15 May 2019: 1pm, CM105


The most common methods for analysing categorical data will be presented. The first part of the seminar focuses on contingency table analysis, with special emphasis on contingency tables with ordinal classification variables and associated models (log-linear or log-nonlinear). Generalised odds ratios are introduced and their role in contingency table modelling is commented. The second part discusses logistic regression models for binary responses, as well as for multi-category ordinal and nominal responses. Examples show the use of R for fitting these models.

Statistical reproducibility for (multiple) pairwise tests in pharmaceutical product development

Speaker: Andrea Simkus, Department of Mathematical Sciences
Wednesday 8 May 2019: 1pm, CM105


In this poster, a new method is presented for calculating statistical reproducibility for the t-test. The method was developed in relation to a test in pharmaceutical discovery and development, which involves 6 test groups whose members are given an increasing dosage of a drug. Multiple pairwise comparisons for the t-test are carried out. The aim of the test is to decide what dosage of the drug is the most effective one. The poster employs nonparametric predictive inference (NPI) for reproducibility. Statistical reproducibility aims to answer the question of whether a repeat of the experiment would lead to the same test decision. NPI bootstrap is adopted for calculating the reproducibility. Two approaches to calculating and representing reproducibility are presented in this poster. First, the reproducibility for the separate pairwise comparisons is calculated by creating NPI bootstraps for both test groups, applying the pairwise comparisons on those and counting how many times the same test decision is reached. The results are then compared with the t-test's nonparametric counterpart, the Wilcoxon Mann-Whitney test. Secondly, the question of whether the decision of choosing a particular dose is reproducible is studied. It is named the reproducibility of the final decision. The poster presents it in a tree diagram and introduces imposed decision rule.

Validation of Wind Turbine Health History

Speaker: Roger Cox, Department of Engineering
Wednesday 27 March 2019: 1pm, CM105


We use probabilistic record linkage to combine wind turbine maintenance records with a database of power outages, for the determination of the health history of wind turbines and for its validation. We explain how this health history is applied to fault troubleshooting and to prognostic modelling for condition based maintenance.

Finite-dimensional distributions of the height of a renewal model

Speaker: Clare Wallace, Department of Mathematical Sciences
Wednesday 20 March 2019: 1pm, CM105


Suppose we have a collection of blocks with (integer) heights and widths, and we use a random selection of them to build a stick whose total width is $n$.
Working from left to right, we track the cumulative total height at the endpoints of each block. We can linearly interpolate between these endpoints to create a piecewise linear height function for the whole stick.
Under a few assumptions about the distributions of heights and widths of the blocks in our collection, we can write a central limit theorem for the height function at any $k$ points along its width. In particular, we can (almost) prove that the height function, properly rescaled, converges to the trajectories of the Brownian motion.

Avoiding local trapping problems in ABC

Speaker: Kieran Richards, Department of Mathematical Sciences
Wednesday 13 March 2019: 1pm, CM105

Approximate Bayesian Computation is a Bayesian technique that allows us to produce samples from a posterior distribution when the likelihood is intractable or computationally difficult to evaluate. Unfortunately ABC can often suffer from extreme local trapping problems where the sampler does not move for long periods of time and hence produces low quality samples that are of little use. We'll apply some stochastic approximation techniques to attempt to avoid the local trapping problem by penalizing the MCMC sampler for remaining in the same place for long periods of time and instead encouraging it to move more evenly around the sample space. We'll then test our new algorithm with some simulated examples and compare the results with those from standard ABC sampling.

Bayesian approaches to Well Test Analysis

Speaker: Themistoklis Botsas, Department of Mathematical Sciences
Wednesday 6 March 2019: 1pm, CM105


Deconvolution for Well Test Analysis is a methodology that solves an inverse problem associated with petroleum engineering and derives an impulse response function that contains important information about the system.
We use a response function form based on the multi-region radial composite model, known in petroleum literature for its flexibility and ability to resemble almost every plausible shape that can be encountered.
We use an errors-in-variables non-linear Bayesian regression model in order to make inferences about the response function. This allows us to include uncertainty for the independent variables, which is essential in our context due to the large observational error. We combine the likelihood with a set of flexible priors for our parameters and we use MCMC algorithms in order to approximate the posterior.
We illustrate the use of our algorithm by applying it to synthetic and field data sets. The results are comparable in quality to the state of the art solution, which is based on the total least squares method, but our method has several advantages: we gain access to meaningful system parameters associated with the flow behaviour in the reservoir; we can incorporate prior knowledge; and we can quantify parameter uncertainty in a principled way by exploiting the advantages of the Bayesian approach.

Simplify Your Code with %>% (The pipe operator)


Speaker: Miguel Lopez-Cruz,
Wednesday 20 February 2019: 1pm, CM105

Removing duplication is an important principle to keep in mind when you are writing code; however, equally important is to keep your code efficient and readable. Very often, efficiency is achieved by replacing long code sentences for shorter ones in an existing code to make it more readable, clear, and explicit. Consequently, writing code that is simple, readable, and efficient may be considered contradictory. In this talk I want to show how the magrittr R package can help in the efficiency when writing code for analyzing datasets for diverse statistical purposes.

Probabilistic Record Linkage

Speaker: Roger Cox, Department of Engineering
Wednesday 13 February 2019: 1pm, CM105


We read through a paper Roger's been looking at, and try to identify the Python code used to implement the calculations.

LASSO for dimensionality reduction in surrogate model-based Optimisation

Speaker: Lorenzo Gentile, TH Köln, Germany
Wednesday 6 February 2019: 1pm, CM105


Surrogate-Model-based optimization (SMBO) plays a prominent role in today’s modelling, simulation, and optimization processes. It can be considered as the most efficient technique for solving expensive and time-demanding real-world optimization problems. In facts, in many engineering problems, a single evaluation is based on either on experimental or numerical analysis. This causes significant costs with respect to time or resources. SMBO pursues the identification of global optima making advantage of a budget allocation process that maximizes the information gaining in promising regions. In SMBO, a data-driven surrogate model is fitted to replace an expensive computer simulation.

However, high dimensionality leads to severe practical issues in the development of surrogate models. For example, depending on the employed distance measures, it is widely recognized that Kriging, one of the most popular technique, may perform poorly for problems with more than approximately 20 variables. A promising solution for overcoming the SMBO limitations in case of high dimensional search space is the feature selection. Among all, a well-established strategy for selecting important variables is the least absolute shrinkage and selection operator (LASSO). For these reasons, a strategy for enhancing a Kriging based SMBO algorithm, by LASSO is currently under development.

In this presentation, the fundamentals of SMBO will be given. Moreover, preliminary results from the application of the enhanced Kriging based SMBO algorithm to both artificial test functions and real-world application coming from the field of aerospace will be shown.

Improving and benchmarking of algorithms for decision making with lower previsions

Speaker: Nawaphon Nakharutai, Department of Mathematical Sciences, Durham University
Wednesday 30 January 2019: 1pm, CM105


Maximality, interval dominance, and E-admissibility, are three well-known criteria for decision making under severe uncertainty, using lower previsions. I will present a new fast algorithm for finding maximal gambles and compare its performance to existing algorithms, one proposed by Troffaes and Hable (2014), and one by Jansen, Augustin, and Schollmeyer (2017). To do so, I develop a new method for generating random decision problems with pre-specified ratios of maximal and interval dominant gambles.

To find all maximal gambles, Jansen et al. solve one large linear program for each gamble. In Troffaes and Hable, and also in a new algorithm, I do so by solving a larger sequence of smaller linear programs. I find that the primal-dual interior point method works best for solving these linear programs. In this work, based on earlier work, I will present efficient ways to find a common feasible starting point for this sequence of linear programs. I exploit these feasible starting points to develop early stopping criteria for the primal-dual interior point method, further improving efficiency.

I also investigate the use of interval dominance to eliminate non-maximal gambles. This can make the problem smaller, and I observe that this benefits Jansen et al.'s algorithm, but perhaps surprisingly, not the other two algorithms. I find that the new algorithm, without using interval dominance, outperforms all other algorithms in all scenarios in a simulation.

Drunken Heroine Quest 2: Deep Fantasy World Application of the Theory of Random Walks

Speaker: Hugo Lo, Department of Mathematical Sciences, Durham University
Wednesday 12th December 2018: 1pm, CM105


Our research on random walk problems has a lot of useful applications in ecology, psychology, computer science, physics, chemistry, biology as well as economics. However, most of them are too serious for this presentation. Instead, we will guide you through some basics of random walk theory, in a format of a fantasy story... Once upon a time, the brave Edward went on a fearful quest of defeating a dragon to win the heart of the beautiful Dorothy. After falling foul of a curse, Edward is trapped in a skyscraping tower of unknown location in the boundless land of Promenatoria. It is now up to Dorothy to break the curse to free her inamorato. With Edward nowhere to be found, alcohol seems to be the only way for Dorothy to pass the days. Without a particular direction nor a systematic search, a random walk journey begins. Are you ready for this exhilarating and unforgettable adventure? In this second episode of the series we will dive into the feelings deep in our heroine heart. All are welcome.

Bayes linear emulation, decision support and applications to petroleum reservoir models

Speaker: Jonathan Owen, Department of Mathematical Sciences, Durham University
Wednesday 5th December 2018: 1pm, CM105


Complex mathematical computer models are used across many scientific disciplines to improve the understanding of the behaviour of the physical system and provide decision support. These often require the specification of a large number of unknown model parameters; involve a choice of decision parameters; and take a long time to evaluate. Decision support, commonly misrepresented as an optimisation task; often requires a large number of model evaluations rendering traditional methods intractable due to their slow speed. Bayes linear emulators as surrogates provide fast, statistical approximations for computer models, yielding predictions for as yet unevaluated parameter settings, along with a corresponding quantification of uncertainty.

The Integrated Systems Approach for Petroleum Production (ISAPP) is a research program with the aim of increasing hydrocarbon recovery who along with TNO (Netherlands Organisation for Applied Scientific Research) have designed a Field Development Optimisation Challenge centred on a fictitious oil reservoir known as OLYMPUS. This challenge exhibits many of the common issues associated with computer experimentation with further complications arising due to geological uncertainty expressed through an ensemble of 50 models. In this presentation, I will discuss Bayes linear emulators and their use in decision support, before describing some of the difficulties encountered in my work to date on the ISAPP Field Development Optimisation Challenge.

Cost effective component swapping to increase system reliability

Speaker: Aesha Najem, Department of Mathematical Sciences, Durham University
Wednesday 21st November 2018: 1pm, CM105


One of the strategies that might be considered to enhance reliability and resilience of a system is swapping components when a component fails, so replacing it by another component from the system which is still functioning. This presentation considers cost effective component swapping to increase system reliability. The cost is discussed in two scenarios, namely fixed cost and time dependent cost for system failure.

Random set theory for frequentist inferences

Speaker: Daniel Krpelik, Department of Mathematical Sciences, Durham University
Wednesday 14th November 2018: 1pm, CM105


Recently, several inferential methods based on the random sets theory were proposed in the literature. Among those, we would like to focus on Confidence Structures. These can be seen as a generalisation of the inferential approach based on Confidence Distributions. In those, the result of the inference is a probability distribution over the range of parameter of interest which can be used to construct confidence intervals and test hypotheses on any level of significance. Using the random set models allows us to seamlessly derive approaches for analysing censored observations without any assumptions about the underlying censoring model whilst retaining the coverage properties of the confidence distributions. We will show the basic ideas behind the concept of confidence structures and demonstrate its use on reliability analysis of a simple system based on a set of censored observations of lifetimes of its components.

History Matching techniques applied to petroleum reservoir: discussing MCMC as sampling technique

Speaker: Helena Nandi Formentin, Department of Mathematical Sciences, Durham University
Wednesday 7th November 2018: 1pm, CM107


In petroleum engineering, reservoir simulation models are representations of real petroleum fields used in production forecast and decision-making process. Observed dynamic data (e.g. bottom-hole pressure and oil production) support the calibration of reservoir models. We use History Matching techniques to reduce our highly-dimensional input space - which contains parameters such as porosity and permeability and fluid properties - through the assimilation of measured data. We use emulation techniques to explore a simplified simulator of a reservoir model, and HM processes to reduce the simulator’s input space. In this section, we will discuss MCMC techniques applied to sample in a reduced and complex space.

Maintenance Record Labelling of Wind Turbine Data for Fault Prognosis

Speaker: Roger Cox, Department of Engineering, Durham University
Wednesday 31st October, 2018: 1pm, CM105


A set of methods are being developed for the determination of the health history of mechanical plant. These are to be applied both in offshore wind turbine maintenance trouble shooting (fault diagnosis) and in condition based maintenance (fault prognosis). One of the methods used is Bernoulli Naive Bayes classification.

Wednesday 24th October 2018:

1pm, CM105

Introduction to Stats4Grads

Speaker: everyone!


Come along to CM105 on Wednesday 24th October at 1pm to get to know your fellow statisticians! We'll introduce ourselves and our research area (briefly!), and then just have an informal chat. Of course, the introductory meeting wouldn't be complete without free pizza ;)

Return to the Statistics Seminar list.