Skip to main content
Log in

Group descent algorithms for nonconvex penalized linear and logistic regression models with grouped predictors

  • Published:
Statistics and Computing Aims and scope Submit manuscript

Abstract

Penalized regression is an attractive framework for variable selection problems. Often, variables possess a grouping structure, and the relevant selection problem is that of selecting groups, not individual variables. The group lasso has been proposed as a way of extending the ideas of the lasso to the problem of group selection. Nonconvex penalties such as SCAD and MCP have been proposed and shown to have several advantages over the lasso; these penalties may also be extended to the group selection problem, giving rise to group SCAD and group MCP methods. Here, we describe algorithms for fitting these models stably and efficiently. In addition, we present simulation results and real data examples comparing and contrasting the statistical properties of these methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Algorithm 1
Fig. 2
Fig. 3
Algorithm 2
Fig. 4
Fig. 5

Similar content being viewed by others

References

  • Bakin, S.: Adaptive regression and model selection in data mining problems. Ph.D. thesis, Australian National University (1999)

  • Bertsekas, D.: Nonlinear Programming, 2nd edn. Athena Scientific, Nashua (1999)

    MATH  Google Scholar 

  • Breheny, P., Huang, J.: Coordinate descent algorithms for nonconvex penalized regression, with applications to biological feature selection. Ann. Appl. Stat. 5, 232–253 (2011)

    Article  MATH  MathSciNet  Google Scholar 

  • Chiang, A., Beck, J., Yen, H., Tayeh, M., Scheetz, T., Swiderski, R., Nishimura, D., Braun, T., Kim, K., Huang, J., et al.: Homozygosity mapping with SNP arrays identifies TRIM32, an E3 ubiquitin ligase, as a Bardet–Biedl syndrome gene (BBS11). Proc. Natl. Acad. Sci. 103, 6287–6292 (2006)

    Article  Google Scholar 

  • Donoho, D., Johnstone, J.: Ideal spatial adaptation by wavelet shrinkage. Biometrika 81, 425–455 (1994)

    Article  MATH  MathSciNet  Google Scholar 

  • Fan, J., Li, R.: Variable selection via nonconcave penalized likelihood and its oracle properties. J. Am. Stat. Assoc. 96, 1348–1360 (2001)

    Article  MATH  MathSciNet  Google Scholar 

  • Foygel, R., Drton, M.: Exact block-wise optimization in group lasso and sparse group lasso for linear regression (2010). Arxiv preprint arXiv:1010.3320

  • Friedman, J., Hastie, T., Höfling, H., Tibshirani, R.: Pathwise coordinate optimization. Ann. Appl. Stat. 1, 302–332 (2007)

    Article  MATH  MathSciNet  Google Scholar 

  • Friedman, J., Hastie, T., Tibshirani, R.: A note on the group lasso and a sparse group lasso (2010a). Arxiv preprint arXiv:1001.0736

  • Friedman, J.H., Hastie, T., Tibshirani, R.: Regularization paths for generalized linear models via coordinate descent. J. Stat. Softw. 33, 1–22 (2010b)

    Google Scholar 

  • Fu, W.: Penalized regressions: the bridge versus the lasso. J. Comput. Graph. Stat. 7, 397–416 (1998)

    Google Scholar 

  • Gao, H., Bruce, A.: Waveshrink with firm shrinkage. Stat. Sin. 7, 855–874 (1997)

    MATH  MathSciNet  Google Scholar 

  • Huang, J., Breheny, P., Ma, S.: A selective review of group selection in high-dimensional models. Stat. Sci. 27, 481–499 (2012)

    Article  MathSciNet  Google Scholar 

  • Hunter, D., Lange, K.: A tutorial on MM algorithms. Am. Stat. 58, 30–38 (2004)

    Article  MathSciNet  Google Scholar 

  • Krishnapuram, B., Carin, L., Figueiredo, M., Hartemink, A.: Sparse multinomial logistic regression: fast algorithms and generalization bounds. IEEE Trans. Pattern Anal. Mach. Intell. 27, 957–968 (2005)

    Article  Google Scholar 

  • Lange, K., Hunter, D., Yang, I.: Optimization transfer using surrogate objective functions. J. Comput. Graph. Stat. 9, 1–20 (2000)

    MathSciNet  Google Scholar 

  • Meier, L., van de Geer, S., Buhlmann, P.: The group lasso for logistic regression. J. R. Stat. Soc. B 70, 53 (2008)

    Article  MATH  Google Scholar 

  • Puig, A., Wiesel, A., Fleury, G., Hero, A.: Multidimensional shrinkage-thresholding operator and group lasso penalties. IEEE Signal Process. Lett. 18, 363–366 (2011)

    Article  Google Scholar 

  • Ravikumar, P., Lafferty, J., Liu, H., Wasserman, L.: Sparse additive models. J. R. Stat. Soc. B 71, 1009–1030 (2009)

    Article  MathSciNet  Google Scholar 

  • Scheetz, T., Kim, K., Swiderski, R., Philp, A., Braun, T., Knudtson, K., Dorrance, A., DiBona, G., Huang, J., Casavant, T., et al.: Regulation of gene expression in the mammalian eye and its relevance to eye disease. Proc. Natl. Acad. Sci. 103, 14429–14434 (2006)

    Article  Google Scholar 

  • She, Y.: An iterative algorithm for fitting nonconvex penalized generalized linear models with grouped predictors. Comput. Stat. Data Anal. 56, 2976–2990 (2012)

    Article  MATH  MathSciNet  Google Scholar 

  • Simon, N., Tibshirani, R.: Standardization and the group lasso penalty. Stat. Sin. 22, 983–1001 (2011)

    MathSciNet  Google Scholar 

  • Tibshirani, R.: Regression shrinkage and selection via the lasso. J. R. Stat. Soc. B 58, 267–288 (1996)

    MATH  MathSciNet  Google Scholar 

  • Tseng, P.: Convergence of a block coordinate descent method for nondifferentiable minimization. J. Optim. Theory Appl. 109, 475–494 (2001)

    Article  MATH  MathSciNet  Google Scholar 

  • Wang, L., Chen, G., Li, H.: Group SCAD regression analysis for microarray time course gene expression data. Bioinformatics 23, 1486 (2007)

    Article  Google Scholar 

  • Wang, L., Li, H., Huang, J.Z.: Variable selection in nonparametric varying-coefficient models for analysis of repeated measurements. J. Am. Stat. Assoc. 103, 1556–1569 (2008)

    Article  MATH  MathSciNet  Google Scholar 

  • Wu, T., Lange, K.: Coordinate descent algorithms for lasso penalized regression. Ann. Appl. Stat. 2, 224–244 (2008)

    Article  MATH  MathSciNet  Google Scholar 

  • Yuan, M., Lin, Y.: Model selection and estimation in regression with grouped variables. J. R. Stat. Soc. B 68, 49–67 (2006)

    Article  MATH  MathSciNet  Google Scholar 

  • Zhang, C.: Nearly unbiased variable selection under minimax concave penalty. Ann. Stat. 38, 894–942 (2010)

    Article  MATH  Google Scholar 

  • Zou, H., Li, R.: One-step sparse estimates in nonconcave penalized likelihood models. Ann. Stat. 36, 1509–1533 (2008)

    Article  MATH  MathSciNet  Google Scholar 

Download references

Acknowledgements

Jian Huang’s work is supported in part by NIH Grants R01CA120988, R01CA142774 and NSF Grants DMS-0805670 and DMS-1208225. The authors would like to thank Rob Mullins for the genetic association data analyzed in Sect. 5.2, as well as the associate editor and two anonymous reviewers, who provided many helpful remarks that led to considerable refinement of this article.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Patrick Breheny.

Appendix

Appendix

Before proving Proposition 1, we establish the groupwise convexity of all the objective functions under consideration. Note that for group SCAD and group MCP, although they contain nonconvex components and are not necessarily convex overall, the objective functions are still convex with respect to the variables in a single group.

Lemma 1

The objective function Q(β j ) for regularized linear regression is a strictly convex function with respect to β j for the group lasso, for group SCAD with γ>2, and for group MCP with γ>1.

Proof

Although Q(β j ) is not differentiable, it is directionally twice differentiable everywhere. Let \(\nabla_{\mathbf{d}}^{2} Q(\boldsymbol{\beta} _{j})\) denote the second derivative of Q(β j ) in the direction d. Then the strict convexity of Q(β j ) follows if \(\nabla_{\mathbf{d}}^{2} Q(\boldsymbol{\beta} _{j})\) is positive definite at all β j and for all d. Let ξ denote the infimum over β j and d of the minimum eigenvalue of \(\nabla_{\mathbf{d}}^{2} Q(\boldsymbol{\beta}_{j})\). Then, after some algebra, we obtain

$$\begin{aligned} \begin{array}{l@{\quad}l} \displaystyle\xi_* = 1 & \mathrm{Group}\ \mathrm{lasso} \\ \displaystyle\xi_* = 1 - \frac{1}{\gamma-1} & \mathrm{Group}\ \mathrm{SCAD} \\ \displaystyle\xi_* = 1 - \frac{1}{\gamma}& \mathrm{Group}\ \mathrm{MCP}. \end{array} \end{aligned}$$

These quantities are positive under the conditions specified in the lemma. □

We now proceed to the proof of Proposition 1.

Proof of Proposition 1

The descent property is a direct consequence of the fact that each updating step consists of minimizing Q(β) with respect to β j . Lemma 1, along with the fact that the least squares loss function is continuously differentiable and coercive, provide sufficient conditions to apply Theorem 4.1 of Tseng (2001), thereby establishing that every limit point of {β (m)} is a stationary point of Q(β).

We further note that {β (m)} is guaranteed to converge to a unique limit point. Suppose that the sequence possessed two limit points, β′ and β″, such that for at least one group, \(\boldsymbol{\beta}'_{j} \neq\boldsymbol{\beta}''_{j}\). For the transition β′→β″ to occur, the algorithm must pass through the point \((\boldsymbol{\beta}''_{j}, \boldsymbol{\beta}'_{-j})\). However, by Lemma 1, \(\boldsymbol{\beta}'_{j}\) is the unique value minimizing Q(β j |β j ). Thus, β′→β″ is not allowed by the group descent algorithm and {β (m)} possesses a single limit point. □

For Proposition 2, involving logistic regression, we proceed similarly, letting \(R(\boldsymbol{\beta}|\tilde{\boldsymbol {\beta}})\) denote the majorizing approximation to Q(β) at \(\tilde{\boldsymbol {\beta}}\).

Lemma 2

The majorizing approximation \(R(\boldsymbol{\beta}_{j}|\tilde {\boldsymbol{\beta}})\) for regularized logistic regression is a strictly convex function with respect to β j at all \(\tilde{\boldsymbol{\beta}}\) for the group lasso, for group SCAD with γ>5, and for group MCP with γ>4.

Proof

Proceeding as in the previous lemma, and letting ξ denote the infimum over \(\tilde{\boldsymbol{\beta}}\), β j and d of the minimum eigenvalue of \(\nabla_{\mathbf{d}}^{2} Q(\boldsymbol{\beta}_{j})\), we obtain

$$\begin{aligned} \begin{array}{l@{\quad}l} \displaystyle\xi_* = \frac{1}{4}& \mathrm{Group}\ \mathrm{lasso} \\ \displaystyle\xi_* = \frac{1}{4} - \frac{1}{\gamma-1}& \mathrm{Group}\ \mathrm{SCAD} \\ \displaystyle\xi_* = \frac{1}{4} - \frac{1}{\gamma}& \mathrm{Group}\ \mathrm{MCP}. \end{array} \end{aligned}$$

These quantities are positive under the conditions specified in the lemma. □

Proof of Proposition 2

The proposition makes two claims: descent with every iteration and convergence to a stationary point. To establish descent for logistic regression, we note that because L is twice differentiable, for any point η there exists a vector η ∗∗ on the line segment joining η and η such that

$$\begin{aligned} L(\boldsymbol{\eta}) &= L\bigl(\boldsymbol{\eta}^*\bigr) + \bigl(\boldsymbol{ \eta}-\boldsymbol{\eta}^*\bigr)^T \nabla L\bigl(\boldsymbol {\eta}^* \bigr) \\ &\quad{}+ \frac{1}{2} \bigl(\boldsymbol{\eta} -\boldsymbol{\eta }^* \bigr)^T \nabla^2 L\bigl(\boldsymbol{\eta}^{**} \bigr) \bigl(\boldsymbol{\eta }-\boldsymbol{\eta}^*\bigr) \\ &\leq\tilde{L}\bigl(\boldsymbol{\eta }|\boldsymbol{\eta}^*\bigr) \end{aligned}$$

where the inequality follows from the fact that v I−∇2 L(η ∗∗) is a positive semidefinite matrix. Descent now follows from the descent property of MM algorithms (Lange et al. 2000) coupled with the fact that each updating step consists of minimizing \(R(\boldsymbol {\beta}_{j}|\tilde{\boldsymbol{\beta}})\).

To establish convergence to a stationary point, we note that if no elements of β tend to ±∞, then the descent property of the algorithm ensures that the sequence β (k) stays within a compact set and therefore possesses a limit point \(\tilde{\boldsymbol{\beta }}\). Then, as in the proof of Proposition 1, Lemma 2 allows us to apply the results of Tseng (2001) and conclude that \(\tilde {\boldsymbol{\beta}}\) must be a stationary point of \(R(\boldsymbol{\beta}|\tilde {\boldsymbol{\beta}})\). Furthermore, because \(R(\boldsymbol{\beta} |\tilde{\boldsymbol{\beta}})\) is tangent to Q(β) at \(\tilde{\boldsymbol{\beta}}\), \(\tilde{\boldsymbol{\beta}}\) must also be a stationary point of Q(β). □

Rights and permissions

Reprints and permissions

About this article

Cite this article

Breheny, P., Huang, J. Group descent algorithms for nonconvex penalized linear and logistic regression models with grouped predictors. Stat Comput 25, 173–187 (2015). https://doi.org/10.1007/s11222-013-9424-2

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11222-013-9424-2

Keywords

Navigation