Durham University Statistics and Probability Group
Durham University Statistics and Probability

Stats4Grads

Welcome to the Stats4Grads website! Here you will find all the information about the seminar series.

Stats4Grads is a weekly seminar in statistics organised by and aimed at postgraduate students. The seminars take place on Wednesdays between 13:00-14:00, usually in CM105, with tea, coffee and biscuits provided by the Department of Mathematical Sciences.

Stats4Grads is a great opportunity to learn about the research of other postgraduate students and their use of statistics. This includes recent developments in statistics as well as applications to "real-world" problems and cross-disciplinary work. Moreover, Stats4Grads provides a relaxed forum in which to discuss and develop ideas, exchange knowledge as well as access help and insight from students who have a deeper understanding of the theory and methodology.

Feel free to invite a friend or collaborator from another institution or department to give a talk if they are in Durham!

Organiser: Jonathan Owen. For information or to give a talk contact: jonathan.owen@durham.ac.uk .

For details of previous years' seminars, click here.

Stats4Grads Timetable 2018/2019

On reproducibility of hypothesis tests based on randomised response data

Speaker: Fatimah Alghamdi, Department of Mathematical Sciences, Durham University
Wednesday 3 July 2019: 1pm, CM105

Abstract:

Randomised response techniques (RRT) are frequently used when data on possibly sensitive information is being collected using a survey. There are many different RRT methods and strategies which seek to know the truth from the respondents with greater efficiency and privacy, without embarrassment, beside its ability to decrease the bias which can happen due to wrong answers; the first techniques was presented by Warner (1965), then Greenberg model (1979) who has modified some properties in the Warner model. A question of interest in hypothesis test scenarios is the reproducibility of the results: if the test was repeated, would it lead to the same conclusion with regard to rejection of the null hypothesis? We address this question for Warner's and Greenberg's methods. We use nonparametric predictive inference, a frequentist approach based on only few assumptions, to derive lower and upper probabilities for test reproducibility. This poses the challenging question of finding another measurement, which is called the measurement of the lower reproducibility probability (MRP), to compare between the Warner and Greenberg tests. The Greenberg model is more efficient than Warner model and the measurement MRP and the area of non-rejection of the measurement (AUMRP) of Greenberg test are higher as well. The work will continue to explore more useful results and real applications of RRT and how it can be developed in another direction.

Efficient Selection of Reservoir Model Outputs within an Emulation Based Iterative Uncertainty Analysis

Speaker: Carla Janaina Ferreira, Department of Mathematical Sciences, Durham University
Wednesday 26 June 2019: 1pm, CM105

Abstract:

When performing classic uncertainty reduction based on dynamic data, a large number of reservoir simulations need to be evaluated at a high computational cost. As an alternative, we construct a series of Bayesian emulators that mimic key aspects of the simulator and we combine these emulators with an iterative procedure that substantially reduces the output space dimension. The objective is to reduce the input space more effectively and efficiently than traditional methods and with a more complete understanding of its associated uncertainties. This study uses a Bayesian statistical approach for uncertainty reduction of complex models which is designed to address problems with the high number of possible input parameter configurations. We detail how to efficiently choose sets of outputs that are suitable for emulation and that are highly informative to reduce the input parameter space. This study investigates different classes of outputs and objective functions. We use output emulators and implausibility analysis iteratively to perform input space reduction and we discuss the strengths and weaknesses of certain popular classes of objective function in this context. We demonstrate our approach via an application to a benchmark synthetic model (built using public data from a Brazilian offshore field) in an early stage of development using 4 years of historical data and 4 producers. This study investigates traditional simulation outputs and also novel classes of outputs, such as misfit indexes and summaries of outputs. We show that despite there being a large number (2132) of possible outputs, only a very small number (<10) are sufficient to represent the available information. We use fast and efficient emulators at each iteration (or wave) to successfully perform the input space uncertainty reduction. In this specific case, we find that water rate and time to breakthrough are the most important outputs to be emulated resulting in the highest space reduction. In the first wave, the water rate output for a single well resulted in an initial reduction of 68% of the input space and the breakthrough time for this same well increased this value to roughly 80%. We observe that objective functions such as misfit indices have complex surfaces that can lead to low-quality emulators and result in non-informative outputs. We present an iterative emulator-based Bayesian uncertainty reduction process in which all possible input parameter configurations that lead to statistically acceptable matches between the simulated and observed data are identified. This incorporates a strong dimension reduction on the output space, resulting in greatly increased efficiency This process is very effective at input space reduction, it is computationally efficient, and allows a better understanding of the complex geometry of the input and output spaces.

Offshore transmission decision-making under severe uncertainty

Speaker: Henna Bains, Department of Engineering, Durham University
Wednesday 12 June 2019: 1pm, CM105

Abstract:

We assess the impact of transmission regulatory regimes and technology choices on project economic performance under uncertainty. The technical and economic uncertainties surrounding offshore power transmission complicate decision-making. We focus on two decisions, the first taken by policy makers regarding which regulatory regime to implement and the second taken by project planners concerning project design specifications, including whether HVAC or HVDC is preferred. Optimal solutions to these problems must be found to support the continued growth of offshore wind. Classical decision-making techniques are unable to adequately deal with the identified uncertainties, and since these decisions could have substantial economic consequences, imprecise probability techniques are utilised. Usually, distributions are assigned to model variables but, under severe uncertainty, it is difficult to identify the appropriate distribution. Using imprecise probability techniques, we bound expected profit and analyse these bounds to find economically preferable options. Contingent on model choices, we find third-party ownership to be optimal, for both HVAC and HVDC technologies. The application of imprecise probabilities to offshore power transmission advances current practice, and enables the decision-maker to base their selection on outputs that reflect the uncertainties involved.

Validation of Wind Turbine Health History

Speaker: Roger Cox, Department of Engineering, Durham University
Wednesday 5 June 2019: 3pm, CM103

Abstract:

We use probabilistic record linkage to combine wind turbine maintenance records with a database of power outages, for the determination of the health history of wind turbines and for its validation. We explain how this health history is applied to fault troubleshooting and to prognostic modelling for condition based maintenance. This work is also being presented at the Wind Energy Science Conference in Cork.

Extension of NPI for Bernoulli data with applications

Speaker: Junbin Chen, Department of Mathematical Sciences, Durham University
Wednesday 29 May 2019: 1pm, CM105

Abstract:

Nonparametric Predictive Inference (NPI) is one of the frequentist statistical techniques within the imprecise probability framework. It has been developed to handled various data types. Our aim here is to extend NPI for Bernoulli data to cope with imprecise Bernoulli data. To achieve this, we recall the fundamental A_(n) assumption NPI based on and generalize its corresponding lattice path counting. Some of the good properties inherited in this extension will be demonstrated by application examples in this talk.

Systematic Uncertainty Reduction for Petroleum Reservoirs Combining Reservoir Simulation and Bayesian Emulation Techniques

Speaker: Helena Nandi Formentin, Department of Mathematical Sciences, Durham University
Wednesday 22 May 2019: 1pm, CM105

Abstract:

Reservoir simulation models incorporate physical laws, reservoir characteristics and production strategies. They represent our understanding of sub-surface structures based on the available information. Emulators are statistical representations of simulation models behaviour, offering fast evaluations of a sufficiently large number of reservoir scenarios, to enable a full uncertainty analysis. Bayesian History Matching (BHM) aims to find the range of reservoir scenarios that are consistent with the historical data, in order to provide comprehensive evaluation of reservoir performance and consistent, unbiased predictions incorporating realistic levels of uncertainty, required for full asset management. We present a systematic approach for uncertainty quantification that combines reservoir simulation and emulation techniques within a coherent Bayesian framework for uncertainty quantification.

Modelling of Ordinal Data

Speaker: Dr Maria Kateri, Institute of Statistics, RWTH Aachen University, Germany
Wednesday 15 May 2019: 1pm, CM105

Abstract:

The most common methods for analysing categorical data will be presented. The first part of the seminar focuses on contingency table analysis, with special emphasis on contingency tables with ordinal classification variables and associated models (log-linear or log-nonlinear). Generalised odds ratios are introduced and their role in contingency table modelling is commented. The second part discusses logistic regression models for binary responses, as well as for multi-category ordinal and nominal responses. Examples show the use of R for fitting these models.

Statistical reproducibility for (multiple) pairwise tests in pharmaceutical product development

Speaker: Andrea Simkus, Department of Mathematical Sciences
Wednesday 8 May 2019: 1pm, CM105

Abstract:

In this poster, a new method is presented for calculating statistical reproducibility for the t-test. The method was developed in relation to a test in pharmaceutical discovery and development, which involves 6 test groups whose members are given an increasing dosage of a drug. Multiple pairwise comparisons for the t-test are carried out. The aim of the test is to decide what dosage of the drug is the most effective one. The poster employs nonparametric predictive inference (NPI) for reproducibility. Statistical reproducibility aims to answer the question of whether a repeat of the experiment would lead to the same test decision. NPI bootstrap is adopted for calculating the reproducibility. Two approaches to calculating and representing reproducibility are presented in this poster. First, the reproducibility for the separate pairwise comparisons is calculated by creating NPI bootstraps for both test groups, applying the pairwise comparisons on those and counting how many times the same test decision is reached. The results are then compared with the t-test's nonparametric counterpart, the Wilcoxon Mann-Whitney test. Secondly, the question of whether the decision of choosing a particular dose is reproducible is studied. It is named the reproducibility of the final decision. The poster presents it in a tree diagram and introduces imposed decision rule.

Validation of Wind Turbine Health History

Speaker: Roger Cox, Department of Engineering
Wednesday 27 March 2019: 1pm, CM105

Abstract:

We use probabilistic record linkage to combine wind turbine maintenance records with a database of power outages, for the determination of the health history of wind turbines and for its validation. We explain how this health history is applied to fault troubleshooting and to prognostic modelling for condition based maintenance.

Finite-dimensional distributions of the height of a renewal model

Speaker: Clare Wallace, Department of Mathematical Sciences
Wednesday 20 March 2019: 1pm, CM105

Abstract:

Suppose we have a collection of blocks with (integer) heights and widths, and we use a random selection of them to build a stick whose total width is $n$.
Working from left to right, we track the cumulative total height at the endpoints of each block. We can linearly interpolate between these endpoints to create a piecewise linear height function for the whole stick.
Under a few assumptions about the distributions of heights and widths of the blocks in our collection, we can write a central limit theorem for the height function at any $k$ points along its width. In particular, we can (almost) prove that the height function, properly rescaled, converges to the trajectories of the Brownian motion.

Avoiding local trapping problems in ABC

Speaker: Kieran Richards, Department of Mathematical Sciences
Wednesday 13 March 2019: 1pm, CM105

Approximate Bayesian Computation is a Bayesian technique that allows us to produce samples from a posterior distribution when the likelihood is intractable or computationally difficult to evaluate. Unfortunately ABC can often suffer from extreme local trapping problems where the sampler does not move for long periods of time and hence produces low quality samples that are of little use. We'll apply some stochastic approximation techniques to attempt to avoid the local trapping problem by penalizing the MCMC sampler for remaining in the same place for long periods of time and instead encouraging it to move more evenly around the sample space. We'll then test our new algorithm with some simulated examples and compare the results with those from standard ABC sampling.

Bayesian approaches to Well Test Analysis

Speaker: Themistoklis Botsas, Department of Mathematical Sciences
Wednesday 6 March 2019: 1pm, CM105

Abstract:

Deconvolution for Well Test Analysis is a methodology that solves an inverse problem associated with petroleum engineering and derives an impulse response function that contains important information about the system.
We use a response function form based on the multi-region radial composite model, known in petroleum literature for its flexibility and ability to resemble almost every plausible shape that can be encountered.
We use an errors-in-variables non-linear Bayesian regression model in order to make inferences about the response function. This allows us to include uncertainty for the independent variables, which is essential in our context due to the large observational error. We combine the likelihood with a set of flexible priors for our parameters and we use MCMC algorithms in order to approximate the posterior.
We illustrate the use of our algorithm by applying it to synthetic and field data sets. The results are comparable in quality to the state of the art solution, which is based on the total least squares method, but our method has several advantages: we gain access to meaningful system parameters associated with the flow behaviour in the reservoir; we can incorporate prior knowledge; and we can quantify parameter uncertainty in a principled way by exploiting the advantages of the Bayesian approach.

Simplify Your Code with %>% (The pipe operator)

Abstract:

Speaker: Miguel Lopez-Cruz,
Wednesday 20 February 2019: 1pm, CM105

Removing duplication is an important principle to keep in mind when you are writing code; however, equally important is to keep your code efficient and readable. Very often, efficiency is achieved by replacing long code sentences for shorter ones in an existing code to make it more readable, clear, and explicit. Consequently, writing code that is simple, readable, and efficient may be considered contradictory. In this talk I want to show how the magrittr R package can help in the efficiency when writing code for analyzing datasets for diverse statistical purposes.

Probabilistic Record Linkage

Speaker: Roger Cox, Department of Engineering
Wednesday 13 February 2019: 1pm, CM105

Abstract:

We read through a paper Roger's been looking at, and try to identify the Python code used to implement the calculations.

LASSO for dimensionality reduction in surrogate model-based Optimisation

Speaker: Lorenzo Gentile, TH Köln, Germany
Wednesday 6 February 2019: 1pm, CM105

Abstract:

Surrogate-Model-based optimization (SMBO) plays a prominent role in today’s modelling, simulation, and optimization processes. It can be considered as the most efficient technique for solving expensive and time-demanding real-world optimization problems. In facts, in many engineering problems, a single evaluation is based on either on experimental or numerical analysis. This causes significant costs with respect to time or resources. SMBO pursues the identification of global optima making advantage of a budget allocation process that maximizes the information gaining in promising regions. In SMBO, a data-driven surrogate model is fitted to replace an expensive computer simulation.

However, high dimensionality leads to severe practical issues in the development of surrogate models. For example, depending on the employed distance measures, it is widely recognized that Kriging, one of the most popular technique, may perform poorly for problems with more than approximately 20 variables. A promising solution for overcoming the SMBO limitations in case of high dimensional search space is the feature selection. Among all, a well-established strategy for selecting important variables is the least absolute shrinkage and selection operator (LASSO). For these reasons, a strategy for enhancing a Kriging based SMBO algorithm, by LASSO is currently under development.

In this presentation, the fundamentals of SMBO will be given. Moreover, preliminary results from the application of the enhanced Kriging based SMBO algorithm to both artificial test functions and real-world application coming from the field of aerospace will be shown.

Improving and benchmarking of algorithms for decision making with lower previsions

Speaker: Nawaphon Nakharutai, Department of Mathematical Sciences, Durham University
Wednesday 30 January 2019: 1pm, CM105

Abstract:

Maximality, interval dominance, and E-admissibility, are three well-known criteria for decision making under severe uncertainty, using lower previsions. I will present a new fast algorithm for finding maximal gambles and compare its performance to existing algorithms, one proposed by Troffaes and Hable (2014), and one by Jansen, Augustin, and Schollmeyer (2017). To do so, I develop a new method for generating random decision problems with pre-specified ratios of maximal and interval dominant gambles.

To find all maximal gambles, Jansen et al. solve one large linear program for each gamble. In Troffaes and Hable, and also in a new algorithm, I do so by solving a larger sequence of smaller linear programs. I find that the primal-dual interior point method works best for solving these linear programs. In this work, based on earlier work, I will present efficient ways to find a common feasible starting point for this sequence of linear programs. I exploit these feasible starting points to develop early stopping criteria for the primal-dual interior point method, further improving efficiency.

I also investigate the use of interval dominance to eliminate non-maximal gambles. This can make the problem smaller, and I observe that this benefits Jansen et al.'s algorithm, but perhaps surprisingly, not the other two algorithms. I find that the new algorithm, without using interval dominance, outperforms all other algorithms in all scenarios in a simulation.

Drunken Heroine Quest 2: Deep Fantasy World Application of the Theory of Random Walks

Speaker: Hugo Lo, Department of Mathematical Sciences, Durham University
Wednesday 12th December 2018: 1pm, CM105

Abstract:

Our research on random walk problems has a lot of useful applications in ecology, psychology, computer science, physics, chemistry, biology as well as economics. However, most of them are too serious for this presentation. Instead, we will guide you through some basics of random walk theory, in a format of a fantasy story... Once upon a time, the brave Edward went on a fearful quest of defeating a dragon to win the heart of the beautiful Dorothy. After falling foul of a curse, Edward is trapped in a skyscraping tower of unknown location in the boundless land of Promenatoria. It is now up to Dorothy to break the curse to free her inamorato. With Edward nowhere to be found, alcohol seems to be the only way for Dorothy to pass the days. Without a particular direction nor a systematic search, a random walk journey begins. Are you ready for this exhilarating and unforgettable adventure? In this second episode of the series we will dive into the feelings deep in our heroine heart. All are welcome.

Bayes linear emulation, decision support and applications to petroleum reservoir models

Speaker: Jonathan Owen, Department of Mathematical Sciences, Durham University
Wednesday 5th December 2018: 1pm, CM105

Abstract:

Complex mathematical computer models are used across many scientific disciplines to improve the understanding of the behaviour of the physical system and provide decision support. These often require the specification of a large number of unknown model parameters; involve a choice of decision parameters; and take a long time to evaluate. Decision support, commonly misrepresented as an optimisation task; often requires a large number of model evaluations rendering traditional methods intractable due to their slow speed. Bayes linear emulators as surrogates provide fast, statistical approximations for computer models, yielding predictions for as yet unevaluated parameter settings, along with a corresponding quantification of uncertainty.

The Integrated Systems Approach for Petroleum Production (ISAPP) is a research program with the aim of increasing hydrocarbon recovery who along with TNO (Netherlands Organisation for Applied Scientific Research) have designed a Field Development Optimisation Challenge centred on a fictitious oil reservoir known as OLYMPUS. This challenge exhibits many of the common issues associated with computer experimentation with further complications arising due to geological uncertainty expressed through an ensemble of 50 models. In this presentation, I will discuss Bayes linear emulators and their use in decision support, before describing some of the difficulties encountered in my work to date on the ISAPP Field Development Optimisation Challenge.

Cost effective component swapping to increase system reliability

Speaker: Aesha Najem, Department of Mathematical Sciences, Durham University
Wednesday 21st November 2018: 1pm, CM105

Abstract:

One of the strategies that might be considered to enhance reliability and resilience of a system is swapping components when a component fails, so replacing it by another component from the system which is still functioning. This presentation considers cost effective component swapping to increase system reliability. The cost is discussed in two scenarios, namely fixed cost and time dependent cost for system failure.

Random set theory for frequentist inferences

Speaker: Daniel Krpelik, Department of Mathematical Sciences, Durham University
Wednesday 14th November 2018: 1pm, CM105

Abstract:

Recently, several inferential methods based on the random sets theory were proposed in the literature. Among those, we would like to focus on Confidence Structures. These can be seen as a generalisation of the inferential approach based on Confidence Distributions. In those, the result of the inference is a probability distribution over the range of parameter of interest which can be used to construct confidence intervals and test hypotheses on any level of significance. Using the random set models allows us to seamlessly derive approaches for analysing censored observations without any assumptions about the underlying censoring model whilst retaining the coverage properties of the confidence distributions. We will show the basic ideas behind the concept of confidence structures and demonstrate its use on reliability analysis of a simple system based on a set of censored observations of lifetimes of its components.

History Matching techniques applied to petroleum reservoir: discussing MCMC as sampling technique

Speaker: Helena Nandi Formentin, Department of Mathematical Sciences, Durham University
Wednesday 7th November 2018: 1pm, CM107

Abstract:

In petroleum engineering, reservoir simulation models are representations of real petroleum fields used in production forecast and decision-making process. Observed dynamic data (e.g. bottom-hole pressure and oil production) support the calibration of reservoir models. We use History Matching techniques to reduce our highly-dimensional input space - which contains parameters such as porosity and permeability and fluid properties - through the assimilation of measured data. We use emulation techniques to explore a simplified simulator of a reservoir model, and HM processes to reduce the simulator’s input space. In this section, we will discuss MCMC techniques applied to sample in a reduced and complex space.

Maintenance Record Labelling of Wind Turbine Data for Fault Prognosis

Speaker: Roger Cox, Department of Engineering, Durham University
Wednesday 31st October, 2018: 1pm, CM105

Abstract:

A set of methods are being developed for the determination of the health history of mechanical plant. These are to be applied both in offshore wind turbine maintenance trouble shooting (fault diagnosis) and in condition based maintenance (fault prognosis). One of the methods used is Bernoulli Naive Bayes classification.

Wednesday 24th October 2018:

1pm, CM105

Introduction to Stats4Grads

Speaker: everyone!

Abstract

Come along to CM105 on Wednesday 24th October at 1pm to get to know your fellow statisticians! We'll introduce ourselves and our research area (briefly!), and then just have an informal chat. Of course, the introductory meeting wouldn't be complete without free pizza ;)

Return to the Statistics Seminar list.