Penalized Regression: LASSO & Cousins

Group and Individual Project III

Supervisor: Emmanuel Ogundimu

Project’s research area: Statistics / Data Science / Statistical Learning

General description

Penalized regression methods play a central role in modern statistics, data science, and machine learning. Among the most influential of these methods is the LASSO (Least Absolute Shrinkage and Selection Operator), which augments a regression objective function with a penalty on the absolute values of the regression coefficients. This encourages sparsity, so that some coefficients are shrunk exactly to zero, thereby carrying out variable selection and estimation simultaneously. The LASSO is especially valuable when there are many candidate predictors, when predictors are correlated, or when the aim is to build interpretable predictive models. It has been extended to practically all statistical models and has been used for regularization in many well-known machine learning algorithms such as support vector machines.

Consider a least squares (LS) regression problem. Suppose we are given a set of responses \(y_{i}, \dots, y_{n}\) and associated vectors of \(p\) predictors \(\bf{x}_{i} \in R^{p}\), where \(i = 1, \ldots, n\). Then the LASSO estimator is given by \[ \mbox{arg min}_{{\beta}} \left\{\sum_{i=1}^n \left(y_i-\sum_{j=1}^p x_{ij}\beta_j\right)^2+\lambda \sum_{j=1}^p|\beta_j|\right\}, \] where \(\lambda\) is the regularization parameter that handles a trade-off between goodness-of-fit and sparsity.

Figure 1: Geometric interpretation of Lasso via elliptical contours (Tibshirani, 1996)

Figure 1 shows the geometrical interpretation of the LASSO problem for LS regression. The contour lines depict the LS problem, and the diamond corresponds to the restriction enforced by the penalty \(\sum_{j=1}^p|\beta_j|\). For a fixed value of \(\lambda\), we aim to find the point on the diamond closest to the LS solution. This point is likely to lie on the axes, thus setting unnecessary parameters equal to zero.

LASSO has close cousins (methods with similar appearances but with improved or complementary properties) such as Adaptive LASSO, Elastic Net (a combination of LASSO and Ridge regression), Fused LASSO, Smooth LASSO, Group LASSO, Relaxed LASSO, and Square-root LASSO. It also has more distant cousins such as BAR (broken adaptive ridge), MCP (minimax concave penalty), SCAD (smoothly clipped absolute deviation), SELO (seamless-\(L_0\)), and SICA (smooth integration of counting and absolute deviation).

This project will introduce students to the theory, computation, and practical use of penalized regression methods, with particular emphasis on how these methods are fitted, tuned, interpreted, and compared. It will also explore their application across a range of data types, including continuous, binary, survival, and zero-inflated count data, with a focus on predictive accuracy and drawing causal inference in policy evaluation.

Group project

The group project will focus on building a strong shared foundation in the theory and practice of penalized regression, especially LASSO-type methods. Students will work together to understand the main ideas that underlie shrinkage and variable selection, and to develop experience in fitting and comparing penalized regression methods in R.

By the end of the group project you will have learned:

how and why penalized regression methods are used in modern statistical modelling;
the formulation of the LASSO problem and its geometric interpretation;
how the regularization parameter \(\lambda\) controls the trade-off between model fit and sparsity;
the differences between LASSO and related methods such as Ridge regression, Elastic Net, Adaptive LASSO, and Group LASSO;
the role of optimization algorithms, particularly coordinate descent, in fitting penalized regression models;
how tuning methods such as cross-validation, AIC, BIC, GCV, and bootstrap-based approaches are used to select the regularization parameter in practice.

By the end of the group project you will have practised:

fitting penalized regression models in R using packages such as glmnet, and producing and interpreting regularization paths;
comparing competing penalized methods on simulated or real data in terms of predictive performance and variable selection;
communicating technical ideas clearly in written and oral form.

Mode of Operation and Evidence of Learning

The group project will operate through a combination of reading, mathematical derivation, simulation, and programming in R. Students will demonstrate their understanding by engaging with core methodological literature, deriving key results (such as the soft-thresholding operator), implementing and applying penalized regression methods, reproducing known results on simulated data, analysing real datasets, and clearly communicating the material in both written and oral formats.

Individual project

The individual project will build on the foundation developed in the group project. Each student will choose a more focused topic and investigate it in greater depth, from a methodological, computational, or applied perspective. Potential topics include:

Regularization parameter tuning: A systematic comparison of cross-validation, BIC, AIC, GCV, and bootstrap-based approaches for selecting \(\lambda\), examining their behaviour in different data settings.
Correlated predictors: Investigating how LASSO, Elastic Net, and related methods perform when predictors exhibit strong multicollinearity, and when grouped selection may be more appropriate.
Non-convex penalties: Studying SCAD, MCP, SELO, or BAR penalties, their theoretical advantages over LASSO (such as the oracle property), and the computational challenges they introduce.
Rare events: Performance of LASSO-type methods in binary regression with rare outcomes or in survival data with heavy censoring, where standard methods may struggle.
Regularized missing data imputation: Incorporating LASSO-type penalties within imputation models to handle high-dimensional incomplete data.
Variable selection for causal inference: Using penalized methods in observational studies to select confounders and improve causal estimates in policy evaluation contexts.
Post-selection inference: Methods for valid statistical inference after model selection by LASSO, including selective inference, sample splitting, and the debiased LASSO.
Ensemble and stacked LASSO: Combining predictions from multiple penalized models (e.g. via stacking or super learner) to improve predictive accuracy.

Students may also propose a related topic of their own, subject to discussion and approval.

Mode of Operation and Evidence of Learning

The individual project will involve independent reading, critical thinking, and sustained investigation of a focused problem. Depending on the topic chosen, the work may involve methodological comparison, simulation design, analysis of real data, or implementation of algorithms in R. Students will demonstrate their understanding through a written account of the work, supported where appropriate by figures, tables, code, and critical discussion of methods, assumptions, and limitations.

Prerequisites and Co-requisites

Prerequisites: Familiarity with the R statistical software and interest in data science.

Additional information

If you would like more information about this project, please contact me at emmanuel.ogundimu@durham.ac.uk.

Resources

Tibshirani, R. (1996). Regression shrinkage and selection via the lasso. J. R. Stat. Soc. Series B, 58(1), 267–288.
Friedman, J., Hastie, T. and Tibshirani, R. (2010). Regularization paths for generalized linear models via coordinate descent. J. Stat. Softw., 33(1), 1–22.
Hastie, T., Tibshirani, R. and Wainwright, M. (2015). Statistical Learning with Sparsity: The Lasso and Generalizations. Chapman & Hall / CRC.
Breheny, P. and Huang, J. (2011). Coordinate descent algorithms for nonconvex penalized regression, with applications to biological feature selection. Ann. Appl. Stat., 5(1), 232–253.
Freijeiro-González, L., Febrero-Bande, M. and González-Manteiga, W. (2022). A critical review of LASSO and its derivatives for variable selection under dependence among covariates. Int. Stat. Rev., 90(1), 118–145.
Vidaurre, D., Bielza, C. and Larrañaga, P. (2013). A survey of L1 regression. Int. Stat. Rev., 81, 361–387.