Durham University Statistics and Probability Group
Durham University Statistics and Probability

Stats4Grads

Welcome to the Stats4Grads website! Here you will find all the information about the seminar series.

Stats4Grads is a weekly seminar in statistics organised by and aimed at postgraduate students. The seminars take place on Wednesdays between 13:00-14:00, usually in CM105, with tea, coffee and biscuits provided by the Department of Mathematical Sciences.

Stats4Grads is a great opportunity to learn about the research of other postgraduate students and their use of statistics. This includes recent developments in statistics as well as applications to "real-world" problems and cross-disciplinary work. Moreover, Stats4Grads provides a relaxed forum in which to discuss and develop ideas, exchange knowledge as well as access help and insight from students who have a deeper understanding of the theory and methodology.

Feel free to invite a friend or collaborator from another institution or department to give a talk if they are in Durham!

Organiser: Jonathan Owen. For information or to give a talk contact: jonathan.owen@durham.ac.uk .

For details of previous years' seminars, click here.

Stats4Grads Timetable 2019/2020

Analysis of Overdispersion in Gamma-H2AX Data

Speaker: Adam Errington, Department of Mathematical Sciences, Durham University
Wednesday 4 December 2019: 13:00, E101

Abstract:

Count data which exhibit overdispersion are extensive in a wide variety of disciplines, such as public health and environmental science. It is typically assumed that the total (aggregated) number of gamma-H2AX foci (DNA repair proteins) produced in a sample of blood cells is Poisson distributed, whose expected yield (average foci per cell) can be represented by a linear function of the absorbed dose. However, in practice, because of unobserved heterogeneity in the cell population, the standard Poisson assumption of equidispersion will most likely be contravened which will cause the variance of the aggregated foci counts to be larger than their mean. In both whole and partial body exposure this phenomenon is perceptible, unlike in the context of the “gold-standard” dicentric assay in which overdispersion is only linked to partial exposure. For such situations, it is possible that utilising a model that can handle overdispersion such as the quasi-Poisson is more preferable to the standard Poisson.

There are many different possible causes of overdispersion and in any modelling situation a number of these could be involved. For our data, some possibilities include experimental variability (for example, a change of technology used in the scoring of cells) and correlation between individual foci counts (or cells) for which both are not accounted for by a fitted model. We will see that the behaviour of dispersion estimates differ considerably between using aggregated data and the full frequency distribution (raw data). To our knowledge, this phenomena has not been investigated in the literature both within and outside the field of biodosimetry. I will explain through simulation how accounting for dependence between observations can impact on the estimated dispersion.

A Bayesian statistical approach to decision support for petroleum reservoir well control optimisation

Speaker: Jonathan Owen, Department of Mathematical Sciences, Durham University
Wednesday 27 November 2019: 13:00, CM105

Abstract:

Complex mathematical computer models are used across many scientific disciplines and industry to improve the understanding of the behaviour of physical systems and increasingly to aid decision makers. Major limitations to the use of computer simulators include their complex structure; high-dimensional parameter spaces and large number of unknown model parameters; which is further compounded by their long evaluation times. Decision support, commonly misrepresented as an optimisation task, often requires a large number of model evaluations rendering traditional optimisation methods intractable whilst simultaneously failing to incorporate uncertainty. Consequently, they may yield non-robust decisions.

I will present an iterative decision support strategy which imitates the history matching procedure aiming to identify a robust class of decisions. Bayes linear emulators provide fast, statistical approximations to computer models, yielding predictions for as yet unevaluated parameter settings, along with a corresponding quantification of uncertainty. Appropriate structured uncertainties are accurately quantified and incorporated to link the sophisticated computer model and the actual system in order to obtain robust decisions for the real world problem.

In the petroleum industry, TNO devised a field development optimisation challenge under uncertainty providing an ensemble of 50 fictitious oil reservoir models generated using a stochastic geology model. This challenge exhibits many of the common issues associated with computer experimentation. I will demonstrate the robust decision support strategy applied to the TNO challenge for a greatly reduced computational cost versus ensemble optimisers. This includes the construction of a targeted Bayesian design as well as methods of identifying subsets of models as representatives for the entire ensemble.

Reducing bias is as easy as ABC with applications to modelling Ebola

Speaker: Kieran Richards, Department of Mathematical Sciences, Durham University
Wednesday 20 November 2019: 13:00, CM105

Abstract:

Approximate Bayesian Computation(ABC) has enabled us in recent years to use increasingly complex models to solve problems that were previously intractable. ABC methods can produce unreliable inference when they introduce high approximation bias into the posterior through careless specification of the ABC kernel. Additionally MCMC-ABC methods often suffer from the local trapping problem which causes poor mixing when the tolerance parameter is low. We propose an alternative ABC algorithm which we show can be used to reduce the approximation bias and provide immunity to local trapping by adaptively constructing the ABC kernel. We demonstrate the new algorithm on real data; calibrating a complex SEIR model to data from the Ebola outbreak of 2014 and estimating the pre intervention transmission rate of the disease.

A Sensitivity Analysis of Adaptive Lasso

Speaker: Tathagata Basu, Department of Mathematical Sciences, Durham University
Wednesday 13 November 2019: 13:00, CM105

Abstract:

Sparse regression is an effcient statistical modelling technique which is of major relevance for high dimensional statistics. There are several ways of achieving sparse regression, the well-known lasso being one of them. However, lasso variable selection may not be consistent in selecting the true sparse model. Zou proposed an adaptive form of the lasso which overcomes this issue, and showed that data driven weights on the penalty term will result in a consistent variable selection procedure. We are interested in the case that the weights are informed by a prior execution of ridge regression. We carry out a sensitivity analysis of the Adaptive lasso through the power parameter of the weights, and demonstrate that, in effect, this parameter takes over the role of the usual lasso penalty parameter. In addition, we use the parameter as an input variable to obtain an error bound on the Adaptive lasso.

Keywords:

Adaptive lasso, sensitivity analysis, oracle properties, variable selection, ridge regression.

This work is funded by the European Commissions H2020 programme, through the UTOPIAE Marie Curie Innovative Training Network, H2020-MSCA-ITN-2016, Grant Agreement number 722734.

Bayes goes to Space: inferring chemical model parameters for tomorrow’s Space journeys

Speaker: Anabel del Val, von Karman Institute for Fluid Dynamics, Belgium
Wednesday 6 November 2019: 13:00, CM105

Abstract:

Venturing into Space requires large amounts of energy to reach orbital and interplanetary velocities. The bulk of this energy is exchanged during the entry phase by converting the kinetic energy of the vehicle into thermal energy in the surrounding atmosphere through the formation of a strong bow shock ahead of the vehicle. The way engineers protect spacecraft from the intense heat of atmospheric entry is by designing two kinds of protection systems: reusable and ablative. Reusable systems are characterized by re-radiating a significant amount of energy from the hot surface back into the atmosphere. Ablative materials, on the other hand, transform the thermal energy into decomposition and removal of the material.

The resulting aerothermal environment surrounding a vehicle during atmospheric entry is consequently extremely complex, as such, we often need efficient uncertainty quantification techniques to extract knowledge from experimental data that can appropriately inform the proposed models. We develop robust Bayesian frameworks that aim at characterizing chemical models parameters for re-entry plasma flows in the presence of both types of protection systems. Special care is devoted to the treatment of nuisance parameters which are unavoidable when performing flow simulations in need of proper boundary conditions beyond the interest of the specific inference. Our formulation involves a particular treatment of these nuisance parameters by solving an auxiliary maximum likelihood problem. Results will be shown for real-world cases.

Analysis of clickstream data

Speaker: Ryan Jessop, Department of Mathematical Sciences, Durham University and Clicksco
Wednesday 30 October 2019: 13:00, CM105

Abstract:

Online user browsing generates vast quantities of typically unexploited data. Investigating this data and uncovering the valuable information it contains can be of substantial value to online businesses, and statistics plays a key role in this process.

The data takes the form of an anonymous digital footprint associated with each unique visitor, resulting in 10^6 unique profiles across 10^7 individual page visits on a daily basis. Exploring, cleaning and transforming data of this scale and high dimensionality (2TB+ of memory) is particularly challenging, and requires cluster computing.

We consider the problem of predicting customer purchases (known as conversions), from the customer’s journey or clickstream, which is the sequence of pages seen during a single visit to a website. We consider each page as a discrete state with probabilities of transitions between the pages, providing the basis for a simple Markov model. Further, Hidden Markov models (HMMs) are applied to relate the observed clickstream to a sequence of hidden states, uncovering meta-states of user activity. We can also apply conventional logistic regression to model conversions in terms of summaries of the profile’s browsing behaviour and incorporate both into a set of tools to solve a wide range of conversion types where we can directly compare the predictive capability of each model.

In real-time, predicting profiles that are likely to follow similar behaviour patterns to known conversions, will have a critical impact on targeted advertising. We illustrate these analyses with results from real data collected by an Audience Management Platform (AMP) - Carbon.

Wednesday 23rd October 2019:

13:00, CM105

Stats4Grads Welcome Session

Speaker: everyone!

Abstract

Come along to CM105 on Wednesday 23rd October at 13:00 to get to know your fellow Statisticians! This is a relaxed and informal event to introduce ourselves and our research area (briefly!), as well as meet others with an interest in Statistics. There will be free pizza! ;)

Return to the Statistics Seminar list.