Elsevier

Signal Processing

Volume 127, October 2016, Pages 239-246
Signal Processing

Generalized LASSO with under-determined regularization matrices

https://doi.org/10.1016/j.sigpro.2016.03.001Get rights and content

Highlights

  • A theoretical study is conducted on the generalized LASSO.

  • A condition is derived to guarantee the equivalence between the generalized LASSO and LASSO.

  • An upper bound is given on the complexity of the LARS algorithm.

  • A closed-form solution of the generalized LASSO is derived for several cases.

Abstract

This paper studies the intrinsic connection between a generalized LASSO and a basic LASSO formulation. The former is the extended version of the latter by introducing a regularization matrix to the coefficients. We show that when the regularization matrix is even- or under-determined with full rank conditions, the generalized LASSO can be transformed into the LASSO form via the Lagrangian framework. In addition, we show that some published results of LASSO can be extended to the generalized LASSO, and some variants of LASSO, e.g., robust LASSO, can be rewritten into the generalized LASSO form and hence can be transformed into basic LASSO. Based on this connection, many existing results concerning LASSO, e.g., efficient LASSO solvers, can be used for generalized LASSO.

Introduction

The least absolute shrinkage and selection operator (LASSO) [1] is one of the most popular approaches for sparse linear regression in the last decade, which is usually formulated asx(λ)=argminx{12yAx2+λx1},where yRn gathers n observed measurements; ARn×p contains p predictors of dimension n; xRp contains p coefficients; . and .1 stand for the ℓ2- and ℓ1-norm respectively; and λ>0 is the regularization parameter, controlling the tradeoff between the data fidelity and the model complexity. The most attracting feature of LASSO is the use of ℓ1-norm regularization, which yields sparse coefficients. The ℓ1-norm regularization results in the piecewise linearity [2] of the solution path {x(λ)|λ(0,+)} (i.e., the set of solutions with respect to continuous change of the regularization parameter λ), allowing for efficient reconstruction of the whole solution path. Based on this property, famous path tracking algorithms such as the least angle regression (LARS) [3] and the homotopy [2], [4] have been developed. LARS is a greedy algorithm working with decreasing λ value. At each iteration, a dictionary atom is selected and appended to the set previously selected, and the next critical λ value is computed. Homotopy is an extension of LARS, performing both forward and backward selection of the atoms already selected.

The generalized LASSO [5] (or analysis [6], [7], least mixed norm [8]) extends the basic LASSO (or synthesis) by imposing a regularization matrix DRm×p on the coefficient vector x:x(λ)=argminx{12yAx2+λDx1},where D typically contains the prior knowledge (e.g.,  structure information) [9] about x. For example, if x is expected to be a piecewise constant signal (i.e., implying that its first order derivative is sparse), then D is taken to be the first order derivative operator [10]. Some variants of LASSO can be regarded as the generalized LASSO by forming a structured matrix D. For example, the fused LASSO proposed by Tibshirani et al. [10] imposes the ℓ1 regularization on both the coefficients and their first order derivatives to encourage the solution to be locally constant. When one of the two regularization parameters is fixed, a fused LASSO problem can be rewritten as a generalized LASSO by cascading a scaled identity matrix with a first order derivative matrix to form the D matrix. The graph-guided fused LASSO [11], [12] incorporates the network prior information into D for correlation structure. Application of generalized LASSO can be found in image restoration [8], visual recognition [13], electroencephalography (EEG) [14], bioinformatics [15], ultrasonics [16], etc.

Since the LASSO has been proposed, results concerning the recovery condition [3], [17], solution path property [18], degree of freedom [19], model selection consistency [20], efficient algorithms as well as software development [2], [4] have been widely studied. One may want to know whether these results are applicable to a generalized LASSO problem, e.g.,  solving a generalized LASSO problem with a LASSO solver. An immediate example is when D is a full rank square (hence invertible) matrix. By a simple change of variables u=Dx, the original generalized LASSO problem can be transformed into the basic LASSO form with a predictor matrix AD1 and a coefficient vector u. Therefore it can be solved by calling the LASSO subroutine.

Although the generalized LASSO and LASSO have been studied from various aspects [5], [7], [21], their connections are not fully explored, which is the main focus of this paper. Elad et al. [6] showed that they are equivalent, but confined the discussion to the denoising case, where A is an identity matrix. Tibshirani and Taylor [5] showed that the former can be transformed to the latter; however, their method needs to introduce a matrix D0 (see Appendices), which brings other potential questions as discussed in the conclusion.

The paper is organized as follows: in Section 2, we show that when the regularization matrix is even- or under-determined with full rank conditions, the generalized LASSO can be transformed into the LASSO form via the Lagrangian framework. Based on this formula, in Section 3 we show that some published results of LASSO can be extended to the generalized LASSO. In Section 4, two variants of LASSO, namely the regularized deconvolution and the robust LASSO are analyzed under the generalized LASSO framework. We conclude the paper in Section 5.

Section snippets

Condition and formula of transformation

The simplification of the generalized LASSO depends on the setting of D [6]. For the even-determined case (m=p),1 if D has full rank, by a simple change of variables u=Dx, the original generalized LASSO problem can be transformed into the basic LASSO with a predictor matrix AD1 and a coefficient vector u. Once

Extension of existing LASSO results

Since the LASSO has been intensively studied during the last decade, many results concerning the computational issue have been published. In this section, we extend some of them into the generalized LASSO problem.

Two examples on the analysis of LASSO variants

Several variants of LASSO can be unified under the generalized LASSO framework, such as the total variation regularized deconvolution for signal and image recovery [29], the robust LASSO for face recognition and sensor network [30], the adaptive LASSO for variable selection [31], the fused LASSO for gene expression data analysis [10], the ℓ1 trend filtering for time series analysis [32], and the adaptive generalized fused LASSO for road-safety data analysis [33]. In this section, the first two

Conclusion

This paper discusses the simplification of a generalized LASSO problem into the basic LASSO form. When the regularization matrix D is even- or under-determined (mp), we showed that this simplification is possible. Otherwise, there is no guarantee that this simplification can be done. In the former case, optimization tools dedicated to LASSO can be straightforwardly applied to the generalized LASSO.

Tibshirani and Taylor [5] gave a simple way to transform a generalized LASSO to the basic LASSO

Acknowledgement

This study was partially supported by National Science Foundation of China (No. 61401352), China Postdoctoral Science Foundation (No. 2014M560786), Shaanxi Postdoctoral Science Foundation, Fundamental Research Funds for the Central Universities (No. xjj2014060), National Science Foundation (No. 1539067), and National Institute of Health (Nos. R01MH104680, R01MH107354, and R01GM109068).

References (35)

  • L. Rudin et al.

    Nonlinear total variation based noise removal algorithm

    Phys. D

    (1992)
  • R. Tibshirani

    Regression shrinkage and selection via the lasso

    J. R. Stat. Soc. B

    (1996)
  • M.R. Osborne et al.

    A new approach to variable selection in least squares problems

    IMA J. Numer. Anal.

    (2000)
  • B. Efron et al.

    Least angle regression

    Ann. Stat.

    (2004)
  • D.M. Malioutov, M. Cetin, A.S. Willsky, Homotopy continuation for sparse signal representation, in: Proceedings of IEEE...
  • R.J. Tibshirani et al.

    The solution path of the generalized lasso

    Ann. Stat.

    (2011)
  • M. Elad et al.

    Analysis versus synthesis in signal priors

    Inverse Probl.

    (2007)
  • S. Vaiter et al.

    Robust sparse analysis regularization

    IEEE Trans. Inf. Theory

    (2013)
  • H. Fu et al.

    Efficient minimization methods of mixed ℓ2–ℓ1 and ℓ1–ℓ1 norms for image restoration

    SIAM J. Sci. Comput.

    (2006)
  • R. Tibshirani et al.

    Sparsity and smoothness via the fused lasso

    J. R. Stat. Soc. Ser. B

    (2005)
  • S. Kim et al.

    A multivariate regression approach to association analysis of a quantitative trait network

    Bioinformatics

    (2009)
  • X. Chen et al.

    Smoothing proximal gradient method for general structured sparse regression

    Ann. Appl. Stat.

    (2012)
  • N. Morioka, S. Satoh, Generalized lasso based approximation of sparse coding for visual recognition, in: Advances in...
  • M. Vega-Hernández et al.

    Penalized least squares methods for solving the EEG inverse problem

    Stat. Sin.

    (2008)
  • R. Tibshirani et al.

    Spatial smoothing and hot spot detection for CGH data using the fused lasso

    Biostatistics

    (2008)
  • C. Yu et al.

    A blind deconvolution approach to ultrasound imaging

    IEEE Trans. Ultrason. Ferroelectr. Freq. Control

    (2012)
  • Cited by (50)

    • Efficient daily solar radiation prediction with deep learning 4-phase convolutional neural network, dual stage stacked regression and support vector machine CNN-REGST hybrid model

      2022, Sustainable Materials and Technologies
      Citation Excerpt :

      The numerical simulations show that, in many fields such as agribusiness [69], financial [70], and epidemiological [71], the REGST outperforms the other base learners [72,73]. This study uses a 2-layer REGST model where Kernel Ridge Regression (KRR) [74], Lasso regression (LASSO) [75], Random Forest Regression (RFR) [76], Extreme Gradient boosting Regression (XGBR) [77], Light Gradient Boosting Machine (LGBM) [74], Extra Tree Regression (ETR) [78] and Adaptive Boosting (AdaBoost) [79] are used as base learners while the SVR [80] with RBF kernel [81] is used as the meta-regressor. Each level-0 model's output is a new input function for the level-1 model (SVR), finally used for the prediction.

    View all citing articles on Scopus
    View full text