Background

Mixture models are an incredibly useful class of models for data whose distribution exhibits a degree of heterogeneity that cannot be captured by a single distribution alone. They have been widely used in many scientific fields, such as astronomy, biology, genomics, finance, medicine and engineering, for a variety of purposes, including density-estimation, unsupervised clustering and capturing unobserved heterogeneity. Mixture modelling is also highly flexible, and can be applied to a wide range of types of data: categorical, discrete or continuous, univariate or multivariate, and so on.

The basic finite mixture model provides a model for a heterogeneous population that consists of, say, $K$ unobserved homogeneous sub-groups, often called components, mixed at random in proportion to the relative sizes, $\eta_1,\ldots,\eta_K$, where $\sum_{k=1}^K \eta_k = 1$. If some random feature $Y$ of the population is heterogeneous across and homogeneous within sub-groups, then we model $Y$ as having a different probability distribution in each sub-group, denoting the $k$th component distribution by $p(y | \boldsymbol{\theta}_k)$, which depends on a parameter $\boldsymbol{\theta}_k$. It is common, though not necessary, to assume that the component distributions $p(y | \boldsymbol{\theta}_k)$ are all from the same parametric family, such as the normal distribution, differing only in the value of the parameter. The distribution of $Y$ is then given by \[\begin{equation*} p(y | \boldsymbol{\theta},\boldsymbol{\eta}) = \sum_{k=1}^K \eta_k p(y | \boldsymbol{\theta}_k), \end{equation*}\] where $\boldsymbol{\theta} = (\boldsymbol{\theta}_1,\ldots,\boldsymbol{\theta}_K)$ and $\boldsymbol{\eta}=(\eta_1,\ldots,\eta_K)^T$. An example with normal component distributions is provided in Figure 1.

$Example of three normal densities, giving two very different mixture distributions, $\eta_1 \times \mathrm{N}(1, 1) + \eta_2 \times \mathrm{N}(3, 0.5^2) + \eta_3 \times \mathrm{N}(4, 1.3^2)$, where for mixture A, $\boldsymbol{\eta}=(0.3, 0.5, 0.2)^T$ and for mixture B, $\boldsymbol{\eta}=(0.84, 0.02, 0.14)^T$.$

Figure 1: Example of three normal densities, giving two very different mixture distributions, $\eta_1 \times \mathrm{N}(1, 1) + \eta_2 \times \mathrm{N}(3, 0.5^2) + \eta_3 \times \mathrm{N}(4, 1.3^2)$, where for mixture A, $\boldsymbol{\eta}=(0.3, 0.5, 0.2)^T$ and for mixture B, $\boldsymbol{\eta}=(0.84, 0.02, 0.14)^T$.

The classic Bayesian analysis completes the model specification through the choice of a prior for the mixture weights, $\boldsymbol{\eta}$, and the component parameters, $\boldsymbol{\theta}$, usually factorised as $\pi(\boldsymbol{\theta},\boldsymbol{\eta}) = \pi(\boldsymbol{\theta}) \pi(\boldsymbol{\eta})$. Given a random sample of $n$ observations from the finite mixture model, $\mathbf{y}=(y_1,\ldots,y_n)^T$, the posterior distribution of interest is then given by \[\begin{equation*} \pi(\boldsymbol{\theta},\boldsymbol{\eta} | \mathbf{y}) \propto \pi(\boldsymbol{\theta}) \pi(\boldsymbol{\eta}) \prod_{i=1}^n \left\{ \sum_{k=1}^K \eta_k p(y_i | \boldsymbol{\theta}_k) \right\}. \end{equation*}\] The posterior is analytically intractable in all but toy examples and so the usual approach is to sample from it using Markov chain Monte Carlo (MCMC) methods. But the product of sums arising from the likelihood function poses a number of challenges to MCMC.

The goal of this project is to introduce and explore Bayesian inference for mixture models.

Group project

The group project will revolve around learning about the mathematical formulation and properties of finite mixture models as well as Bayesian methodology for working with them.

By the end of the group project you will have learned:

Properties of finite mixture models, such as moments.
Formulation as a latent variable model with categorical allocation variables.
Parameter inference through data augmentation and MCMC when the number of sub-groups $K$ is fixed.
Non-identifiability due to labelling invariance and its consequences.
Methods to produce an identified posterior sample.

By the end of the group project you will be able to:

Write R code to implement MCMC for fitting mixture models to data.
Diagnose the convergence and mixing of your MCMC sampler.
Employ methodology to produce an identified posterior sample.
Interpret your results.

Mode of operation and evidence of learning

The project will revolve around learning through reading and programming in R. Students will demonstrate their understanding by comparing theory to simulation results, writing R code to implement core methodology, analysing simulated and real data sets, and clearly communicating the material in both written and oral formats.

Individual project

The individual project will build on the knowledge we have gained in the group project and will explore additional advanced topics. A few examples of topics you will be able to investigate are:

Learning the number of mixture components.
Infinite mixture models, such as the Dirichlet process mixture models.
Hidden Markov models, which allow modelling of time-series data.
Mixtures of experts models, such as mixtures of regression models.
Model-based clustering.
Spike-and-slab prior distributions for variable selection in linear regression.

Mode of operation and evidence of learning

Prerequisites and Co-requisites

Prerequisites: Statistical Inference II, Data Science and Statistical Modelling II.

Co-requisites: Bayesian Computation and Modelling III.

Additional information

If you would like more information about this project, please contact me at sarah.e.heaps@durham.ac.uk

Resources

Frühwirth-Schnatter, S. (2006) Finite Mixture and Markov Switching Models. Springer.
Frühwirth-Schnatter, S., Celeux, G. and Robert, C. P. (2019) Handbook of Mixture Analysis. Chapman & Hall / CRC.
Gelman, A., Carlin, J., Stern, H., Dunson, D., Vehtari, A., and Rubin, D. (2013). Bayesian Data Analysis, Third Edition. Chapman & Hall / CRC.
Papastamoulis, P. (2016) label.switching: An R package for dealing with the label switching problem in MCMC outputs. Journal of Statistical Software, 69(1), 1–24.
Yao, W. and Xiang, S. (2025) Mixture Models: Parametric, Semiparametric, and New Directions. Chapman & Hall / CRC.

Mixture Modelling

Supervisor: Sarah Heaps

Project’s research area: Statistics

Background

Group project

Mode of operation and evidence of learning

Individual project

Mode of operation and evidence of learning

Prerequisites and Co-requisites

Additional information

Resources