Statistical Methods for Spatial Data

Jonathan Cumming

Statistics
Spatial statistics
Reading
Programming (R)
Simulation
Data analysis
Statistical modelling

Spatial data arise whenever observations are associated with geographical location. Typical examples include measurements of air pollution recorded at monitoring stations, outcomes aggregated over administrative areas (such as income or disease incidence), and the observed locations of events such as crimes, earthquakes or cases of infection.

In each case we observe values \(Y\) together with spatial locations \(s\). A defining feature of such data is that nearby observations tend to be related. Classical models that assume independence are therefore inadequate. The central question becomes:

How should spatial dependence be represented and modelled?

Cases of primary biliary cirrhosis and controls representing at-risk population (left) and intensity ratio (right) in north-eastern England between 1987 and 1994.

These maps illustrate two key features of spatial data: events occur at specific locations, and the underlying risk may vary across space. Different statistical frameworks emphasise different aspects of this structure. In this project we examine three principal paradigms:

The project begins with a structured group exploration of these frameworks and the ideas that underpin them. In the individual component, each student will pursue one direction in greater depth.

Group Project

The group project provides a broad but coherent introduction to the modelling of spatial structure. The emphasis is on understanding how spatial dependence is defined, measured and incorporated into statistical models.

We begin with the nature of spatial structure: what distinguishes spatial from non-spatial data; first- and second-order structure; the role of proximity and neighbourhood; and exploratory visualisation of spatial datasets. We then examine three principal modelling frameworks.

Point pattern data. When data consist of event locations, we ask whether the observed pattern is consistent with spatial randomness, or whether clustering or inhibition is present. We introduce Complete Spatial Randomness, the idea of an intensity function, and basic exploratory and simulation-based techniques.

Areal data. When observations are aggregated over regions, spatial dependence is encoded through neighbourhood relationships. We examine adjacency structures and spatial graphs, measures of spatial autocorrelation (e.g. Moran’s $I$), and basic smoothing ideas for regional data.

Geostatistical data. When real-valued measurements are observed at spatial locations, we treat the data as a realisation of a spatial random field. We introduce covariance functions and stationarity (at a conceptual level), variograms and their interpretation, and the basic idea of spatial prediction and interpolation.

Throughout, the aim is comparative: how do these frameworks differ, what assumptions do they make, and how do they represent spatial dependence?

Expected outcomes

Individual Project

In the individual component, each student will specialise in one of the three spatial frameworks and develop a more detailed investigation. This may involve further theoretical development, more sophisticated modelling, or a sustained analysis of a real dataset.

Point processes. Possible directions include:

Areal models. Possible directions include:

Geostatistics. Possible directions include:

The individual investigation should demonstrate clear understanding of modelling assumptions, careful implementation, and thoughtful interpretation of results.

Mode of Operation and Evidence of Learning

This project revolves around developing conceptual understanding of spatial dependence through reading, discussion and implementation in R. The emphasis is on linking mathematical ideas to computational realisation and empirical behaviour.

Understanding will be demonstrated through the ability to move fluently between mathematical description of spatial structure, computational implementation, empirical behaviour of models, and interpretation in the context of real data. The emphasis is on understanding how spatial dependence is encoded and why particular modelling choices are appropriate.

Pre-requisites

Resources