Probabilistic AI for Regression Problems

Hailiang Du

Description

Regression problems are central to statistics, machine learning, and scientific modelling. In many applications, however, predicting only a single value is not sufficient. For decision making under uncertainty, we often need a full predictive distribution that quantifies the range of plausible outcomes, tail risks, and uncertainty associated with the prediction.

This project explores probabilistic AI methods for regression problems. The aim is to move beyond deterministic point prediction and study methods that produce full probabilistic forecasts. Possible approaches include probabilistic tree-based models, mixture density networks and distribution-to-distribution learning, where the input or output may itself be represented as a probability distribution.

The project is suitable for students interested in machine learning, uncertainty quantification, probabilistic forecasting, and interpretable regression modelling. Depending on the student’s interests, the project may be more theoretical, computational, or application-oriented.

Possible Directions

Probabilistic extensions of tree-based regression methods.
Comparison between deterministic regression and probabilistic regression methods.
Construction and evaluation of predictive distributions rather than point estimates.
Use of proper scoring rules, such as the logarithmic score or continuous ranked probability score, for training and evaluating probabilistic regressors.
Mixture density networks or neural probabilistic regression models.
Distribution-to-distribution regression, where the goal is to learn a mapping from input distributions to output distributions.
Applications to simulated data, environmental data, energy systems, or other regression datasets.
Interpretable uncertainty quantification for regression problems.

Mode of Operation and Evidence of Learning

This project develops understanding of probabilistic regression through a combination of reading, mathematical formulation, computational implementation, and empirical comparison. The emphasis is on linking statistical ideas with practical machine learning algorithms and understanding how uncertainty is represented, learned, and evaluated.

Students will:

Read and synthesise material on probabilistic regression, uncertainty quantification, and probabilistic forecasting.
Implement deterministic and probabilistic regression models in Python or R.
Construct predictive distributions and compare them with standard point prediction methods.
Evaluate models using appropriate probabilistic scoring rules and diagnostic plots.
Critically assess the strengths and limitations of different probabilistic AI approaches.
Communicate statistical and computational reasoning clearly in written form.

Understanding will be demonstrated through the ability to move between mathematical definitions, algorithmic implementation, empirical results, and interpretation of uncertainty. Evidence of learning will include code, numerical experiments, visualisations, and a written project report.

Prerequisites

Students should have some background in statistical modelling and machine learning. Familiarity with regression, probability distributions, and basic programming in Python or R would be useful.

Statistical Inference II, for familiarity with standard statistical ideas and experience with R.
Statistical Modelling II, helpful for understanding simple statistical models.

Resources

James, Witten, Hastie and Tibshirani, An Introduction to Statistical Learning: https://www.statlearning.com/
Murphy, Probabilistic Machine Learning: An Introduction: https://probml.github.io/pml-book/book1.html
Gneiting and Raftery, “Strictly Proper Scoring Rules, Prediction, and Estimation”.
Du, H., “Beyond Strictly Proper Scoring Rules: The Importance of Being Local”.
Relevant research papers on probabilistic forecasting, distributional regression, and uncertainty quantification will be suggested according to the student’s chosen direction.