Project IV 2015-16


Exact Tests in Analysis of Categorical Data

Peter Craig

Description

The majority of procedures for testing particular statistical hypotheses are only approximate, in the sense that the actual significance level differs from that claimed; usually, the approximation improves as the sample size increases.  However, in some special situations, there exist exact tests which work for all sample sizes.  A familiar example is the test, based on the t-distribution, for the mean of a population which is known/assumed to have a normal distribution.

The best known example is in fact Fisher's exact test which is an alternative to the usual inexact "chi-squared" test of independence in a two-factor contingency table.  An implementation is a standard component in R.  In principle, the idea extends to a wide variety of models for categorical data seen as higher-dimensional contingency tables.  However, the computational challenges are significant and frequently simulation methods are used to approximate the exact test, even for two-factor tables with large numbers of rows and columns.

Students will study the basic ideas underlying such exact tests and approaches to tackling the computational complexity.  There are connections to a number of ideas studied in the Bayesian Methods module, including Monte Carlo, importance sampling and Markov Chain Monte Carlo.  However, there is no need to be taking Bayesian Methods in order to do this project.

Prerequisites

Statistical Concepts II and an interest in computation (probably but not necessarily in R). Topics in Statistics should prove helpful.

Resources

Wikipedia Exact Test article and whither it leads you.

Fisher RA (1922). On the interpretation of χ2 from contingency tables, and the calculation of P, Journal of the Royal Statistical Society, 85, pp. 87-94.

Agresti A, An Introduction to Categorical Data, Wiley, 1996.

Besag J and Clifford P (1989), Generalised Monte Carlo significance tests, Biometrika, 74, pp. 633-642.

email: Peter Craig