Supervisor: Emmanuel Ogundimu
Project’s research area: Statistics / Data Science / Statistical Learning
Penalized regression methods play a central role in modern statistics, data science, and machine learning. Among the most influential of these methods is the LASSO (Least Absolute Shrinkage and Selection Operator), which augments a regression objective function with a penalty on the absolute values of the regression coefficients. This encourages sparsity, so that some coefficients are shrunk exactly to zero, thereby carrying out variable selection and estimation simultaneously. The LASSO is especially valuable when there are many candidate predictors, when predictors are correlated, or when the aim is to build interpretable predictive models. It has been extended to practically all statistical models and has been used for regularization in many well-known machine learning algorithms such as support vector machines.
Consider a least squares (LS) regression problem. Suppose we are given a set of responses \(y_{i}, \dots, y_{n}\) and associated vectors of \(p\) predictors \(\bf{x}_{i} \in R^{p}\), where \(i = 1, \ldots, n\). Then the LASSO estimator is given by \[ \mbox{arg min}_{{\beta}} \left\{\sum_{i=1}^n \left(y_i-\sum_{j=1}^p x_{ij}\beta_j\right)^2+\lambda \sum_{j=1}^p|\beta_j|\right\}, \] where \(\lambda\) is the regularization parameter that handles a trade-off between goodness-of-fit and sparsity.
Figure 1: Geometric interpretation of Lasso via elliptical contours (Tibshirani, 1996)
Figure 1 shows the geometrical interpretation of the LASSO problem for LS regression. The contour lines depict the LS problem, and the diamond corresponds to the restriction enforced by the penalty \(\sum_{j=1}^p|\beta_j|\). For a fixed value of \(\lambda\), we aim to find the point on the diamond closest to the LS solution. This point is likely to lie on the axes, thus setting unnecessary parameters equal to zero.
LASSO has close cousins (methods with similar appearances but with improved or complementary properties) such as Adaptive LASSO, Elastic Net (a combination of LASSO and Ridge regression), Fused LASSO, Smooth LASSO, Group LASSO, Relaxed LASSO, and Square-root LASSO. It also has more distant cousins such as BAR (broken adaptive ridge), MCP (minimax concave penalty), SCAD (smoothly clipped absolute deviation), SELO (seamless-\(L_0\)), and SICA (smooth integration of counting and absolute deviation).
This project will introduce students to the theory, computation, and practical use of penalized regression methods, with particular emphasis on how these methods are fitted, tuned, interpreted, and compared. It will also explore their application across a range of data types, including continuous, binary, survival, and zero-inflated count data, with a focus on predictive accuracy and drawing causal inference in policy evaluation.
The group project will focus on building a strong shared foundation in the theory and practice of penalized regression, especially LASSO-type methods. Students will work together to understand the main ideas that underlie shrinkage and variable selection, and to develop experience in fitting and comparing penalized regression methods in R.
By the end of the group project you will have learned:
By the end of the group project you will have practised:
glmnet, and producing and interpreting regularization
paths;The group project will operate through a combination of reading, mathematical derivation, simulation, and programming in R. Students will demonstrate their understanding by engaging with core methodological literature, deriving key results (such as the soft-thresholding operator), implementing and applying penalized regression methods, reproducing known results on simulated data, analysing real datasets, and clearly communicating the material in both written and oral formats.
The individual project will build on the foundation developed in the group project. Each student will choose a more focused topic and investigate it in greater depth, from a methodological, computational, or applied perspective. Potential topics include:
Students may also propose a related topic of their own, subject to discussion and approval.
The individual project will involve independent reading, critical thinking, and sustained investigation of a focused problem. Depending on the topic chosen, the work may involve methodological comparison, simulation design, analysis of real data, or implementation of algorithms in R. Students will demonstrate their understanding through a written account of the work, supported where appropriate by figures, tables, code, and critical discussion of methods, assumptions, and limitations.
Prerequisites: Familiarity with the R statistical software and interest in data science.
If you would like more information about this project, please contact me at emmanuel.ogundimu@durham.ac.uk.
Tibshirani, R. (1996). Regression shrinkage and selection via the lasso. J. R. Stat. Soc. Series B, 58(1), 267–288.
Friedman, J., Hastie, T. and Tibshirani, R. (2010). Regularization paths for generalized linear models via coordinate descent. J. Stat. Softw., 33(1), 1–22.
Hastie, T., Tibshirani, R. and Wainwright, M. (2015). Statistical Learning with Sparsity: The Lasso and Generalizations. Chapman & Hall / CRC.
Breheny, P. and Huang, J. (2011). Coordinate descent algorithms for nonconvex penalized regression, with applications to biological feature selection. Ann. Appl. Stat., 5(1), 232–253.
Freijeiro-González, L., Febrero-Bande, M. and González-Manteiga, W. (2022). A critical review of LASSO and its derivatives for variable selection under dependence among covariates. Int. Stat. Rev., 90(1), 118–145.
Vidaurre, D., Bielza, C. and Larrañaga, P. (2013). A survey of L1 regression. Int. Stat. Rev., 81, 361–387.