Project IV 15-16

Project IV (MATH4072) 2015-16

Constructing League Tables with Random Effect Models

Dr J. Einbeck

Description
League tables play an increasingly important role in performance monitoring and are used as a basis for policy making and resource allocation by official and governmental bodies. They are employed, for instance, in order to obtain objective comparisons between different countries, regions, or schools with respect to some educational performance measure (skill level), or to compare the risk of a disease between certain regions. In the latter example, a very naive approach to constructing a league table would be to divide the observed disease counts in each region by its population size. League tables constructed by ordering such ``crude rates'' are frequently found in newspapers, but they are highly variable and not very reliable. (For simplicity, the units being compared are referred to as `regions' from now on, even if they are in fact schools, hospitals, etc).
More reliable league tables can be produced by stabilizing these volatile and crude rates through appropriate statistical models. Such models try to borrow strength from the overall data set when constructing the region-wise rates. A suitable family of models in this context are `random effect models', which extend the linear model by adding an additional term, which can be thought of as a region-specific intercept term, to the linear predictor. This region-specific intercept term is unknown and considered as a random variable, where a frequent assumption is a normal distribution. For instance, a random effect model suitable for the modelling of rates is given by
log(rate) = x^Tb+z
where b is a traditional parameter vector, x is a predictor vector, and z the random effect. The non-random part, x^Tb, can be used to adust for item-specifc covariates just as gender. Models of this type have occasionally been referred to as ``partially Bayesian models'', as the parameter z carries a distributional assumption, but b doesn't. The predicted random effect z for each region gives the value which will be used in the league table, yielding less variable estimates than the crude rates. In this project, you will gain basic knowledge on random effect modelling (based on the linear or the generalized linear model), and you will use modern software (R) to construct league tables based on random effect models.
You will work with real data sets taken principally from the Education and Health sectors. A particularly interesting study that we may want to focus on is the recent PIACC study of adult literacy (which can be seen as the adult analogue of the PISA study) which is available from the OECD web site given below.
The construction of league tables does not necessarily need to stand in the foreground of this project. Other questions which may be of interest in particular for the PIACC study include how to impute scores for missing values, how to describe the distribution of skill levels in different countries, model selection and robustness issues, or a critical discussion of the official OECD analysis of this data set.
Prerequisites

Statistical Methods III
Topics in Statistics III is useful but not necessary

Resources

Aitkin, Hinde & Francis, and Darnell (2009) Statistical Modelling in R, Oxford University Press, p. 461 ff.
Dobson & Barnett (2008). An Introduction to Generalized Linear Models, CRC Press.
The OECD PIACC web site.
An Example for league tables: Guardian university league tables 2015

email: jochen.einbeck "at" durham.ac.uk