Generalized LASSO with under-determined regularization matrices
Introduction
The least absolute shrinkage and selection operator (LASSO) [1] is one of the most popular approaches for sparse linear regression in the last decade, which is usually formulated aswhere gathers n observed measurements; contains p predictors of dimension n; contains p coefficients; and stand for the ℓ2- and ℓ1-norm respectively; and is the regularization parameter, controlling the tradeoff between the data fidelity and the model complexity. The most attracting feature of LASSO is the use of ℓ1-norm regularization, which yields sparse coefficients. The ℓ1-norm regularization results in the piecewise linearity [2] of the solution path (i.e., the set of solutions with respect to continuous change of the regularization parameter λ), allowing for efficient reconstruction of the whole solution path. Based on this property, famous path tracking algorithms such as the least angle regression (LARS) [3] and the homotopy [2], [4] have been developed. LARS is a greedy algorithm working with decreasing λ value. At each iteration, a dictionary atom is selected and appended to the set previously selected, and the next critical λ value is computed. Homotopy is an extension of LARS, performing both forward and backward selection of the atoms already selected.
The generalized LASSO [5] (or analysis [6], [7], least mixed norm [8]) extends the basic LASSO (or synthesis) by imposing a regularization matrix on the coefficient vector :where typically contains the prior knowledge (e.g., structure information) [9] about . For example, if is expected to be a piecewise constant signal (i.e., implying that its first order derivative is sparse), then is taken to be the first order derivative operator [10]. Some variants of LASSO can be regarded as the generalized LASSO by forming a structured matrix . For example, the fused LASSO proposed by Tibshirani et al. [10] imposes the ℓ1 regularization on both the coefficients and their first order derivatives to encourage the solution to be locally constant. When one of the two regularization parameters is fixed, a fused LASSO problem can be rewritten as a generalized LASSO by cascading a scaled identity matrix with a first order derivative matrix to form the matrix. The graph-guided fused LASSO [11], [12] incorporates the network prior information into for correlation structure. Application of generalized LASSO can be found in image restoration [8], visual recognition [13], electroencephalography (EEG) [14], bioinformatics [15], ultrasonics [16], etc.
Since the LASSO has been proposed, results concerning the recovery condition [3], [17], solution path property [18], degree of freedom [19], model selection consistency [20], efficient algorithms as well as software development [2], [4] have been widely studied. One may want to know whether these results are applicable to a generalized LASSO problem, e.g., solving a generalized LASSO problem with a LASSO solver. An immediate example is when is a full rank square (hence invertible) matrix. By a simple change of variables , the original generalized LASSO problem can be transformed into the basic LASSO form with a predictor matrix and a coefficient vector . Therefore it can be solved by calling the LASSO subroutine.
Although the generalized LASSO and LASSO have been studied from various aspects [5], [7], [21], their connections are not fully explored, which is the main focus of this paper. Elad et al. [6] showed that they are equivalent, but confined the discussion to the denoising case, where is an identity matrix. Tibshirani and Taylor [5] showed that the former can be transformed to the latter; however, their method needs to introduce a matrix (see Appendices), which brings other potential questions as discussed in the conclusion.
The paper is organized as follows: in Section 2, we show that when the regularization matrix is even- or under-determined with full rank conditions, the generalized LASSO can be transformed into the LASSO form via the Lagrangian framework. Based on this formula, in Section 3 we show that some published results of LASSO can be extended to the generalized LASSO. In Section 4, two variants of LASSO, namely the regularized deconvolution and the robust LASSO are analyzed under the generalized LASSO framework. We conclude the paper in Section 5.
Section snippets
Condition and formula of transformation
The simplification of the generalized LASSO depends on the setting of [6]. For the even-determined case (m=p),1 if has full rank, by a simple change of variables , the original generalized LASSO problem can be transformed into the basic LASSO with a predictor matrix and a coefficient vector . Once
Extension of existing LASSO results
Since the LASSO has been intensively studied during the last decade, many results concerning the computational issue have been published. In this section, we extend some of them into the generalized LASSO problem.
Two examples on the analysis of LASSO variants
Several variants of LASSO can be unified under the generalized LASSO framework, such as the total variation regularized deconvolution for signal and image recovery [29], the robust LASSO for face recognition and sensor network [30], the adaptive LASSO for variable selection [31], the fused LASSO for gene expression data analysis [10], the ℓ1 trend filtering for time series analysis [32], and the adaptive generalized fused LASSO for road-safety data analysis [33]. In this section, the first two
Conclusion
This paper discusses the simplification of a generalized LASSO problem into the basic LASSO form. When the regularization matrix is even- or under-determined (), we showed that this simplification is possible. Otherwise, there is no guarantee that this simplification can be done. In the former case, optimization tools dedicated to LASSO can be straightforwardly applied to the generalized LASSO.
Tibshirani and Taylor [5] gave a simple way to transform a generalized LASSO to the basic LASSO
Acknowledgement
This study was partially supported by National Science Foundation of China (No. 61401352), China Postdoctoral Science Foundation (No. 2014M560786), Shaanxi Postdoctoral Science Foundation, Fundamental Research Funds for the Central Universities (No. xjj2014060), National Science Foundation (No. 1539067), and National Institute of Health (Nos. R01MH104680, R01MH107354, and R01GM109068).
References (35)
- et al.
Nonlinear total variation based noise removal algorithm
Phys. D
(1992) Regression shrinkage and selection via the lasso
J. R. Stat. Soc. B
(1996)- et al.
A new approach to variable selection in least squares problems
IMA J. Numer. Anal.
(2000) - et al.
Least angle regression
Ann. Stat.
(2004) - D.M. Malioutov, M. Cetin, A.S. Willsky, Homotopy continuation for sparse signal representation, in: Proceedings of IEEE...
- et al.
The solution path of the generalized lasso
Ann. Stat.
(2011) - et al.
Analysis versus synthesis in signal priors
Inverse Probl.
(2007) - et al.
Robust sparse analysis regularization
IEEE Trans. Inf. Theory
(2013) - et al.
Efficient minimization methods of mixed ℓ2–ℓ1 and ℓ1–ℓ1 norms for image restoration
SIAM J. Sci. Comput.
(2006)
Sparsity and smoothness via the fused lasso
J. R. Stat. Soc. Ser. B
A multivariate regression approach to association analysis of a quantitative trait network
Bioinformatics
Smoothing proximal gradient method for general structured sparse regression
Ann. Appl. Stat.
Penalized least squares methods for solving the EEG inverse problem
Stat. Sin.
Spatial smoothing and hot spot detection for CGH data using the fused lasso
Biostatistics
A blind deconvolution approach to ultrasound imaging
IEEE Trans. Ultrason. Ferroelectr. Freq. Control
Cited by (50)
Identification of high-risk population of pneumoconiosis using deep learning segmentation of lung 3D images and radiomics texture analysis
2024, Computer Methods and Programs in BiomedicineEfficient daily solar radiation prediction with deep learning 4-phase convolutional neural network, dual stage stacked regression and support vector machine CNN-REGST hybrid model
2022, Sustainable Materials and TechnologiesCitation Excerpt :The numerical simulations show that, in many fields such as agribusiness [69], financial [70], and epidemiological [71], the REGST outperforms the other base learners [72,73]. This study uses a 2-layer REGST model where Kernel Ridge Regression (KRR) [74], Lasso regression (LASSO) [75], Random Forest Regression (RFR) [76], Extreme Gradient boosting Regression (XGBR) [77], Light Gradient Boosting Machine (LGBM) [74], Extra Tree Regression (ETR) [78] and Adaptive Boosting (AdaBoost) [79] are used as base learners while the SVR [80] with RBF kernel [81] is used as the meta-regressor. Each level-0 model's output is a new input function for the level-1 model (SVR), finally used for the prediction.