# Nonparametric Predictive Inference

This page is under construction - in due course, it will contain an introduction to this topic, links to published material, preprints (a few already here!), and comments on work in progress. There will also be information on possible topics for PhD study.

## Introduction

A natural starting point for statistical inference is often the assumptions of exchangeability of random quantities. To put it simply for real-valued random quantities: if one has n exchangeable random quantities, they are all equally likely to be the smallest, second smallest, etc. So, for one such a random quantity, the probability of its rank among all these random quantities is uniformly distributed over the values 1 to n (assuming no ties for simplicity). Bruce Hill (1968; JASA 63, 677-691) introduced an assumption, called A(n), where this exchangeability property is actually used directly for prediction of one (or more) future values, on the basis of n observations. He later discussed this in much more detail, and summarized his findings as: `Let me conclude by observing that A(n) is supported by all of the serious approaches to statistical inference. It is Bayesian, fiducial, and even a confidence/tolerance procedure. it is simple, coherent, and plausible. It can even be argued, I believe, that A(n) constitutes the fundamental solution to the problem of induction.'

Nevertheless, A(n) has not received much attention in the statistical literature. A logical reason is that it only assigns equal probabilities for the next observation to belong to each of the n+1 intervals created by the previous n observations, so very few inferences can be based on this without requiring additional assumptions!? Well, this is indeed so if one uses only precise probabilities, where every event A of interest is assumed to occur with a single-valued probability P(A). However, there is no need to quantify uncertainty via only a single number, and indeed there are many arguments to use an interval-valued probability instead, so [L(A),U(A)], for which a variety of interpretations are available, all generalizing possible interpretations of P(A). A convenient interpretation of these lower and upper probabilities, L(A) and U(A), is as the optimal bounds for P(A) that can be deduced from the available information. Such interval-valued probabilities have been around since the middle of the 19th century, and have received increasing attention since the early 90s, under different names including `imprecise probability' (Walley) and `interval probability' (Weichselberger). Clearly, within such a concept of interval-valued probability, one can base statistical inference on Hill's A(n) only, so without requiring further assumptions.

Inspired by the need to develop statistical methods that rely on few (modelling) assumptions, we have been developing A(n)-based inferences, mostly using interval probability, since the mid 90s, in collaboration with a number of colleagues and students, both at Durham and further afield. We have worked on general statistical inferences, on problems in reliability, and on problems in operational research, the latter leading to OR policies which are explicitly adaptive to available data, so deleting the often made assumption of fully known probability distributions. As such inferential methods are both nonparametric and predictive, that is directly in terms of one or more future observables, we like to refer to this approach as `NONPARAMETRIC PREDICTIVE INFERENCE'. Below we summarize our work in each of these three areas, of course there is overlap between these. One exciting aspect of this approach is that the amount of information available in the data is directly related to the differences between corresponding upper and lower probability, providing a whole new dimension to uncertainty quantification when compared to statistical methods which use only precise probabilities, such as standard Bayesian and frequentist methods including most commonly used nonparametric methods.

## NPI: Recent Research

Several papers already published can be found here. (Not all these papers are on NPI: there will soon be descriptions on this page.) Below are a few recent papers, mostly as preprints:
• Nonparametric predictive comparison of proportions: pdf version. This is a preprint version of a paper, jointly with Pauline Coolen-Schrijner, that is currently in submission to Journal of Statistical Planning and Inference (invited revised version).
• Learning from multinomial data: a nonparametric predictive alternative to the Imprecise Dirichlet Model: pdf version. This paper, jointly with Thomas Augustin (Munich), has appeared in: ISIPTA'05: Proceedings of the Fourth International Symposium on Imprecise Probabilities and Their Applications, F.G. Cozman, R. Nau and T. Seidenfeld (Eds), published by SIPTA, pp. 125-134.
• Nonparametric adaptive opportunity-based age replacement strategies: pdf version. This is a preprint version of a paper, jointly with Pauline Coolen-Schrijner and Simon Shaw (Bath), that is to appear in Journal of the Operational Research Society, probably early 2006.
• On nonparametric predictive inference and objective Bayesianism: pdf version. This is a preprint version of a paper to appear in a special issue of Journal of Logic, Language and Information, containing papers presented at the Progic2005 workshop.

## Topics for PhD study

We invite strong(ly motivated) candidates for postgraduate study (PhD and MSc by research) to contact us about opportunities to study topics in NPI at Durham, under our supervision. Examples of interesting topics are available in the following areas (more details will follow), of course further suggestions are most welcome!
• Statistics: classification; bootstrapping; quality control; regression; multi-dimensional data
• Reliability: applications of the Coolen-Yan method; NPI alternatives to the Proportional Hazards model; competing risks
• Operational Research: NPI and stochastic processes; further applications to queueing problems; inventory models
• Combinatorics: NPI lower and upper probabilities for multiple future observations (discrete random quantities)
• Computational: we would like to establish a library of NPI algorithms in R, to make the method widely available

Last revision: 20/10/05