Folded concave penalized sparse linear regression: sparsity, statistical performance, and algorithmic theory for local solutions

Liu, Hongcheng; Yao, Tao; Li, Runze; Ye, Yinyu

doi:10.1007/s10107-017-1114-y

Folded concave penalized sparse linear regression: sparsity, statistical performance, and algorithmic theory for local solutions

Full Length Paper
Series A
Published: 10 February 2017

Volume 166, pages 207–240, (2017)
Cite this article

Mathematical Programming Submit manuscript

Hongcheng Liu¹,
Tao Yao ORCID: orcid.org/0000-0002-2124-5678¹,
Runze Li² &
…
Yinyu Ye³

1490 Accesses
26 Citations
Explore all metrics

Abstract

This paper concerns the folded concave penalized sparse linear regression (FCPSLR), a class of popular sparse recovery methods. Although FCPSLR yields desirable recovery performance when solved globally, computing a global solution is NP-complete. Despite some existing statistical performance analyses on local minimizers or on specific FCPSLR-based learning algorithms, it still remains open questions whether local solutions that are known to admit fully polynomial-time approximation schemes (FPTAS) may already be sufficient to ensure the statistical performance, and whether that statistical performance can be non-contingent on the specific designs of computing procedures. To address the questions, this paper presents the following threefold results: (1) Any local solution (stationary point) is a sparse estimator, under some conditions on the parameters of the folded concave penalties. (2) Perhaps more importantly, any local solution satisfying a significant subspace second-order necessary condition (S\(^3\)ONC), which is weaker than the second-order KKT condition, yields a bounded error in approximating the true parameter with high probability. In addition, if the minimal signal strength is sufficient, the S\(^3\)ONC solution likely recovers the oracle solution. This result also explicates that the goal of improving the statistical performance is consistent with the optimization criteria of minimizing the suboptimality gap in solving the non-convex programming formulation of FCPSLR. (3) We apply (2) to the special case of FCPSLR with minimax concave penalty and show that under the restricted eigenvalue condition, any S\(^3\)ONC solution with a better objective value than the Lasso solution entails the strong oracle property. In addition, such a solution generates a model error (ME) comparable to the optimal but exponential-time sparse estimator given a sufficient sample size, while the worst-case ME is comparable to the Lasso in general. Furthermore, to guarantee the S\(^3\)ONC admits FPTAS.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Comparing solution paths of sparse quadratic minimization with a Stieltjes matrix

Article Open access 04 May 2023

Linear-step solvability of some folded concave and singly-parametric sparse optimization problems

Article 20 January 2022

Fast and scalable Lasso via stochastic Frank–Wolfe methods with a convergence guarantee

Article 21 July 2016

Notes

Throughout this paper, a “local solution” refers to a solution that at least satisfies the first-order KKT condition, and may or may not satisfy a second-order necessary condition.

References

Adamczak, R., Litvak, A., Pajor, A., Tomczak-Jaegermann, N.: Quantitative estimates of the convergence of the empirical covariance matrix in log-concave ensembles. J. Am. Math. Soc. 234, 535–561 (2010)
MathSciNet MATH Google Scholar
Bertsimas, D., Mazumder, R.: Least quantile regression via modern optimization. Ann. Stat. 42, 2494–2525 (2014)
Article MathSciNet MATH Google Scholar
Bian, W., Chen, X.: Optimality conditions and complexity for non-Lipschitz constrained optimization problems. http://www.polyu.edu.hk/ama/staff/xjchen/OCT26 (2014)
Bian, W., Chen, X., Ye, Y.: Complexity analysis of interior point algorithms for non-Lipschitz and non-convex minimization. Math. Program. A 149, 301–327 (2015)
Article MATH Google Scholar
Bickel, P.J., Ritov, Y., Tsybakov, A.B.: Simultaneous analysis of Lasso and Dantzig selector. Ann. Stat. 37, 1705–1732 (2009)
Article MathSciNet MATH Google Scholar
Candés, E., Tao, T.: Decoding by linear programming. IEEE Trans. Inf. Theory. 51(12), 4203–4215 (2005)
Article MathSciNet MATH Google Scholar
Cartis, C., Gould, N.I.M., Toint, P.I.: Adaptive cubic regularization methods for unconstrained optimization. Part I: motivation, convergence and numerical results. Math. Prog. A 127, 245–295 (2011)
Article MATH Google Scholar
Chen, X., Ge, D., Wang, Z., Ye, Y.: Complexity of unconstrained L\(_2\)-L\(_{\mathbf{p}}\) minimization. Math. Prog. A. 143, 371–383 (2014)
Article Google Scholar
Chen, X., Xu, F., Ye, Y.: Lower bound theory of non-zero entries in solutions of L\(_2\)-L\(\mathbf{p}\) minimization. SIAM J. Sci. Comput. 32(5), 2832–2852 (2010)
Article MathSciNet MATH Google Scholar
Fan, J., Li, R.: Variable selection via nonconcave penalized likelihood and its oracle properties. J. Am. Stat. Assoc. 96, 1348–1360 (2001)
Article MathSciNet MATH Google Scholar
Fan, J., Lv, J.: Nonconcave penalized likelihood with NP-dimensionality. IEEE Trans. Inf. Theory 57, 5467–5484 (2011)
Article MathSciNet MATH Google Scholar
Fan, J., Lv, J., Qi, L.: Sparse high dimensional models in economics. Annu. Rev. Econo. 3, 291–317 (2011)
Article Google Scholar
Fan, J., Xue, L., Zou, H.: Strong oracle optimality of folded concave penalized estimation. Ann. Stat. 42(3), 819–849 (2014)
Article MathSciNet MATH Google Scholar
Ge, D., Wang, Z., Ye, Y., Yin, H.: Strong NP-hardness result for regularized \(L_q\)-minimization problems with concave penalty functions. arxiv:1501.00622v1 (2015)
Hunter, D., Li, R.: Variable selection using MM algorithms. Ann. Stat. 33, 1617–1642 (2005)
Article MathSciNet MATH Google Scholar
Hsu, D., Kakade, S.M., Zhang, T.: Random design analysis of ridge regression. arXiv:1106.2363v2. (2014)
Hsu, D., Kakade, S.M., Zhang, T.: A tail inequality for quadratic forms of subgaussian random vectors. Electron. Commun. Probab. 17(52), 1–6 (2012)
MathSciNet MATH Google Scholar
Huo, X., Chen, J.: Complexity of penalized likelihood estimation. J. Stat. Comput. Simul. 80(7), 747–759 (2010)
Article MathSciNet MATH Google Scholar
Liu, H., Yao, T., Li, R.: Global solutions for folded concave penalized nonconvex learning. Ann. Stat. 44(2), 629–659 (2016)
Article MathSciNet MATH Google Scholar
Liu, H., Yao, T., Li, R., Ye, Y.: Electronic Companion to: Folded Concave Penalized Sparse Linear Regression: Sparsity, Statistical Performance, and Algorithmic Theory for Local Solutions (2017)
Loh, P.-L., Wainwright, M.J.: Regularized M-estimators with nonconvexity: statistical and algorithmic theory for local optima. J. Mach. Learn. Res. 16, 559–616 (2015)
MathSciNet MATH Google Scholar
Negahban, S.N., Ravikumar, P., Wainwright, M.J., Yu, B.: A unified framework for high-dimensional analysis of M-estimators with decomposable regularizers. Stat. Sci. 27(4), 538–557 (2012)
Article MathSciNet MATH Google Scholar
Nesterov, Yu., Polyak, B.T.: Cubic regularization of Newton’s method and its global performance. Math. Program. 108(1), 177–205 (2006)
Article MathSciNet MATH Google Scholar
Raskutti, G., Wainwright, M., Yu, B.: Restricted nullspace and eigenvalue properties for correlated Gaussian designs. J. Mach. Learn. Res. 11, 2241–2259 (2010)
MathSciNet MATH Google Scholar
Rudelson, M., Zhou, S.: Reconstruction from anisotropic random measurements. IEEE Trans. Inf. Theory 59(6), 3434–3447 (2013)
Article MathSciNet MATH Google Scholar
Raskutti, G., Wainwright, M.J., Yu, B.: Minimax rates of estimation for high-dimensional linear regression over \(\ell _q\)-balls. IEEE Trans. Inf. Theory 57(10), 6976–6994 (2011)
Article MATH Google Scholar
Tibshirani, R.: Regression shrinkage and selection via the Lasso. J. R. Stat. Soc. B 58(1), 267–288 (1996)
MathSciNet MATH Google Scholar
van de Geer, S.A., Bühlmann, P.: On the conditions used to prove oracle results for the Lasso. Electron. J. Stat. 3, 1360–1392 (2009)
Article MathSciNet MATH Google Scholar
Vavasis, S.A.: Quadratic programming is in NP. Inf. Process. Lett. 36, 73–77 (1990)
Article MathSciNet MATH Google Scholar
Vershynin, R.: How close is the sample covariance matrix to the actual covariance matrix. arXiv:1004.3484v2 (2010)
Wang, L., Kim, Y., Li, R.: Calibrating non-convex penalized regressioni in ultra-high dimension. Ann. Stat. 41(5), 2505–2536 (2013)
Article MATH Google Scholar
Wang, Z., Liu, H., Zhang, T.: Optimal computational and statistical rates of convergence for sparse non-convex learning problems. Ann. Stat. 42(6), 2164–2201 (2014)
Article MATH Google Scholar
Ye, Y.: On affine scaling algorithms for non-convex quadratic programming. Math. Program. 56, 285–300 (1992)
Article MATH Google Scholar
Ye, Y.: On the complexity of approximating a KKT point of quadratic programming. Math. Program. 80, 195–211 (1998)
Article MathSciNet MATH Google Scholar
Zhang, C.: Nearly unbiased variable selection under minimax concave penalty. Ann. Stat. 28, 894–942 (2010)
Article MathSciNet MATH Google Scholar
Zhang, Y., Wainwright, M.J., Jordan, M.I.: Lower bounds on the performance of polynomial-time algorithms for sparse linear regression. JMLR: Worksh. Conf. Proc. 35, 1–18 (2014)
Google Scholar
Zhang, C., Zhang, T.: A general theory of concave regularization for high dimensional sparse estimation problems. Stat. Sci. 27(4), 576–593 (2012)
Article MathSciNet MATH Google Scholar
Zhou, S.: Restricted Eigenvalue Conditions on Subgaussian Random Matrices. arXiv:0912.4045v2 (2009)
Zou, H., Li, R.: One-step sparse estimation in non-concave penalized likelihood models. Ann. Stat. 36, 1509–1533 (2008)
Article MATH Google Scholar

Download references

Acknowledgements

The authors thank the AE and referees for their valuable comments, which significantly improve the paper. This work was supported by Penn State Grace Woodward Collaborative Research Grant, NSF grants CMMI 1300638 and DMS 1512422, NIH grants P50 DA036107 and P50 DA039838, Marcus PSU-Technion Partnership grant, Air Force Office of Scientific Research grant FA9550-12-1-0396, and Mid-Atlantic University Transportation Centers grant. This work was also partially supported by NNSFC grants 11690014 and 11690015. The content is solely the responsibility of the authors and does not necessarily represent the official views of the NSF, the NIDA, the NIH, the AFOSR, the MAUTC or the NNSFC.

Author information

Authors and Affiliations

Harold and Inge Marcus Department of Industrial and Manufacturing Engineering, The Pennsylvania State University, University Park, PA, 16802, USA
Hongcheng Liu & Tao Yao
Department of Statistics and the Methodology Center, The Pennsylvania State University, University Park, PA, 16802, USA
Runze Li
Department of Management Science and Engineering, Stanford University, Stanford, CA, 94305, USA
Yinyu Ye

Authors

Hongcheng Liu
View author publications
You can also search for this author in PubMed Google Scholar
Tao Yao
View author publications
You can also search for this author in PubMed Google Scholar
Runze Li
View author publications
You can also search for this author in PubMed Google Scholar
Yinyu Ye
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Runze Li.

Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 136 KB)

Appendix

1.1 Some useful Lemmas

Lemma 6

For any \(\mathbf x^{true}\in \mathfrak {R}^p\), \(\mathbf A\in \mathfrak {R}^{n\times p}\), \(W\in \mathfrak {R}^n\), \(\mathbf b=\mathbf A\mathbf x^{true}+W\), consider f as defined in (3) with arbitrarily either \(P_{\lambda }=P_{\lambda ,SCAD}\) or \(P_{\lambda }=P_{\lambda ,MCP}\). Let \(\mathbf x^0\in \mathfrak {R}^p\) be a feasible solution to (3). If \(f(\mathbf x^0)\) satisfies that \( f(\mathbf x^{0})\le f(\mathbf x^{lasso})\), where \(\mathbf x^{lasso}\) is defined in (4) with the same problem data \( \mathbf x^{true}\), \(\mathbf A\), and \(\mathbf b\) as (3) and with an arbitrary penalty parameter \(\lambda _{lasso}>0\), then \( f(\mathbf x^{0})-f(\mathbf x^{true})\le (\lambda _{lasso}+\lambda )\left| \mathbf x^{lasso} - \mathbf x^{true}\right| . \)

Proof

Denote that \( f_{lasso}(\mathbf x)=(2n)^{-1}\Vert \mathbf A\mathbf x-\mathbf b\Vert ^2+\sum _{i=1}^p\lambda _{lasso}\vert x_i\vert \) for any \(\mathbf x=(x_i)\in \mathfrak {R}^p\).

Firstly, notice that by definition of \(\mathbf x^{lasso}\) in (4), \( f_{lasso}(\mathbf x^{lasso})\le f_{lasso}(\mathbf x^{true}).\) We then know that \( (2n)^{-1}\Vert \mathbf A\mathbf x^{lasso}-\mathbf b\Vert ^2-(2n)^{-1}\Vert \mathbf A\mathbf x^{true}-\mathbf b\Vert ^2\le \sum _{i=1}^p\lambda _{lasso}\vert x_i^{true}\vert -\sum _{i=1}^p\lambda _{lasso}\vert x_i^{lasso}\vert \le \sum _{i=1}^p\lambda _{lasso}\vert x_i^{true}- x_i^{lasso}\vert = \lambda _{lasso}\vert \mathbf x^{true}- \mathbf x^{lasso}\vert \)

Secondly, due to the concavity and differentiability of \(P_\lambda (\cdot )\) on \(\mathfrak {R}_+\) and the fact that \(0\le P'_\lambda (\vert x\vert )\le \lambda \) for all \(x\in \mathfrak {R}\), \(\sum _{i=1}^{p}P_\lambda (\vert x_i^{lasso}\vert )- \sum _{i=1}^{p}P_\lambda (\vert x_i^{true}\vert )\le \sum _{i=1}^p P_\lambda '(\vert x_i^{true}\vert )\cdot \left( \vert x_i^{lasso}\vert -\vert x_i^{true}\vert \right) \le \sum _{i=1}^p P_\lambda '(\vert x_i^{true}\vert )\cdot \vert x_i^{lasso} - x_i^{true}\vert \le \lambda \left| \mathbf x^{lasso} - \mathbf x^{true}\right| \).

Combining the above and the assumption that \(f(\mathbf x^{0})\le f(\mathbf x^{lasso})\), we know that \(f(\mathbf x^{0})-f(\mathbf x^{true})\le f(\mathbf x^{lasso})-f(\mathbf x^{true}) =(2n)^{-1}\Vert \mathbf A\mathbf x^{lasso}-\mathbf b\Vert ^2+\sum _{i=1}^{p}P_\lambda (\vert x_i^{lasso}\vert )-(2n)^{-1}\Vert \mathbf A\mathbf x^{true}-\mathbf b\Vert ^2- \sum _{i=1}^{p}P_\lambda (\vert x_i^{true}\vert ) \le (\lambda _{lasso}+\lambda )\left| \mathbf x^{lasso} - \mathbf x^{true}\right| ,\) as claimed. \(\square \)

Lemma 7

Assume that Condition B holds with initial solution \(\mathbf x^0\in \mathfrak {R}^p\). For any \(\mathbf x^{true}\in \mathfrak {R}^p\), \(\mathbf A\in \mathfrak {R}^{n\times p}\), \(W\in \mathfrak {R}^n\), \(\mathbf b=\mathbf A\mathbf x^{true}+W\), and for any \(\mathbf x^*=(x_i^*)\in \mathfrak {R}^p\) that satisfies (i) S\(^3\)ONC to (3) with arbitrarily either \(P_\lambda =P_{\lambda ,SCAD}\) or \(P_\lambda =P_{\lambda ,MCP}\); and (ii) the inequality that \(f(\mathbf x^*)\le f(\mathbf x^0)\), the following inequality holds: \((2n)^{-1}\Vert \mathbf A(\mathbf x^{*}-\mathbf x^{true})\Vert ^2\le \,n^{-1}W^\top \mathbf A(\mathbf x^{*}-\mathbf x^{true}) +\min \left\{ \sum _{i\in \mathcal S}P'_\lambda (\vert x_i^{*}\vert )\vert x_i^{true}\vert ,\,\sum _{i\in \mathcal S}P'_\lambda (\vert x_i^{*}\vert )\vert x_i^{*}- x_i^{true}\vert \right\} \). If, in addition, \(f(\mathbf x^*)\le f(\mathbf x^{true})+{\varGamma }\) for an arbitrary \({\varGamma }\ge 0\), then \( \frac{1}{2n}\Vert \mathbf A(\mathbf x^{*}-\mathbf x^{true})\Vert ^2\le \,\frac{1}{n}W^\top \mathbf A(\mathbf x^{*}-\mathbf x^{true})+\min \left\{ \sum _{i\in \mathcal S}P'_\lambda (\vert x_i^{*}\vert )\vert x_i^{true}\vert ,\,\sum _{i\in \mathcal S}P'_\lambda (\vert x_i^{*}\vert )\vert x_i^{*}{-} x_i^{true}\vert ,\, P_{\lambda }(a\lambda )\cdot {(\vert \mathcal S\vert {-}\Vert \mathbf x^{*}\Vert _0)}{+}{\varGamma }\right\} .\)

Proof

Notice that \(\mathbf b=\mathbf A\mathbf x^{true}+W\). Then for any \(\mathbf x=(x_i)\in \mathfrak {R}^p\): \((2n)^{-1}\Vert \mathbf A\mathbf x-\mathbf b\Vert ^2+\sum _{i=1}^pP_{\lambda }'(\vert x^{*}_i\vert )\vert x_i\vert =(2n)^{-1}\Vert \mathbf A(\mathbf x-\mathbf x^{true})\Vert ^2+(2n)^{-1}W^\top W-n^{-1}W^\top \mathbf A(\mathbf x-\mathbf x^{true})+\sum _{i=1}^pP_{\lambda }'(\vert x^{*}_i\vert )\vert x_i\vert \).

Since \(\mathbf x^*\) satisfies S\(^3\)ONC, which implies FONC, we know that \(\mathbf x^*\in \arg \inf \{ \frac{1}{2n}\Vert \mathbf A\mathbf x-\mathbf b\Vert ^2+\sum _{i=1}^pP_{\lambda }'(\vert x^{*}_i\vert )\vert x_i\vert :\,\mathbf x\in \mathfrak {R}^p\}.\) Therefore, \(\frac{1}{2n}\Vert \mathbf A\mathbf x^*-\mathbf b\Vert ^2+\sum _{i=1}^pP_{\lambda }'(\vert x^{*}_i\vert )\vert x_i^*\vert \le \frac{1}{2n}\Vert \mathbf A\mathbf x^{true}-\mathbf b\Vert ^2+\sum _{i=1}^pP_{\lambda }'(\vert x^{*}_i\vert )\vert x_i^{true}\vert \). Combining the above, we know that \((2n)^{-1}\Vert \mathbf A(\mathbf x^{*}-\mathbf x^{true})\Vert ^2-n^{-1}W^\top \mathbf A(\mathbf x^{*}-\mathbf x^{true})+\sum _{i=1}^pP_{\lambda }'(\vert x^{*}_i\vert )\vert x_i^{*}\vert \le \sum _{i=1}^pP'_\lambda (\vert x_i^{*}\vert )\vert x_i^{true}\vert \). Further invoking the definitions of \(\mathbf x^{true}\) and \(\mathcal S\) as well as triangular inequality and the fact that \(P'_{\lambda }(\vert x\vert )\ge 0\) for any \(x\in \mathfrak {R}\), we have \((2n)^{-1}{\Vert \mathbf A(\mathbf x^{*}-\mathbf x^{true})\Vert ^2} \le \, n^{-1}{W^\top \mathbf A(\mathbf x^{*}-\mathbf x^{true})}+\sum _{i\in \mathcal S}P'_\lambda (\vert x_i^{*}\vert )\vert x_i^{true}\vert -\sum _{i=1}^pP_{\lambda }'(\vert x^{*}_i\vert )\vert x_i^{*}\vert \le \, n^{-1}W^\top \mathbf A(\mathbf x^{*}-\mathbf x^{true})+\sum _{i\in \mathcal S}P'_\lambda (\vert x_i^{*}\vert )\vert x_i^{true}- x_i^{*}\vert -\sum _{i\in \mathcal S^c}P_{\lambda }'(\vert x^{*}_i\vert )\vert x_i^{*}\vert \). We then obtain the claimed result in the first part of the lemma.

To show the second part, by assumption, \(f(\mathbf x^*)\le f(\mathbf x^{true})+{\varGamma }\), we know \((2n)^{-1}\Vert \mathbf A(\mathbf x^{*}-\mathbf x^{true})\Vert ^2-n^{-1}{W^\top \mathbf A(\mathbf x^{*}-\mathbf x^{true})}+(2n)^{-1}{\Vert W\Vert ^2}+\sum _{i=1}^{p}P_{\lambda }(\vert x_i^*\vert )\le (2n)^{-1}{\Vert W\Vert ^2}+\sum _{i =1}^{p}P_{\lambda }(\vert x_i^{true}\vert )+{\varGamma }\). Noticing the fact that (i) \(0\le P_{\lambda }(\vert x\vert )\le P_{\lambda }(a\lambda )\) for any \(x\in \mathfrak {R}\), (ii) \(P_{\lambda }(\vert 0\vert )=0\), and (iii) by definition of \(\mathcal S^c\), \(x_i^{true}=0\) for all \(i\in \mathcal S^c\), we hence know \((2n)^{-1}\Vert \mathbf A(\mathbf x^{*}-\mathbf x^{true})\Vert ^2-n^{-1}{W^\top \mathbf A(\mathbf x^{*}-\mathbf x^{true})}\le P_{\lambda }(a\lambda )\cdot \vert \mathcal S\vert -\sum _{i=1}^{p}P_{\lambda }(\vert x_i^*\vert )+{\varGamma }\). Invoking Corollaries 3 and 4 under Condition B and the assumption that \(f(\mathbf x^*)\le f(\mathbf x^0)\), we know that \(x_i^*\ne 0\Longrightarrow \vert x_i^*\vert \ge a\lambda \). Also notice that \(P_{\lambda }(\vert x\vert )=P_{\lambda }(a\lambda )\) for all \(x\in \mathfrak {R}:\,\vert x\vert \ge a\lambda \). Therefore, the above implies \(\sum _{i=1}^{p}P_{\lambda }(\vert x_i^*\vert )= {P_{\lambda }(a\lambda )\cdot \Vert \mathbf x^{*}\Vert _0}\) and \((2n)^{-1}\Vert \mathbf A(\mathbf x^{*}-\mathbf x^{true})\Vert ^2-n^{-1}{W^\top \mathbf A(\mathbf x^{*}-\mathbf x^{true})}\le P_{\lambda }(a\lambda )\cdot (\vert \mathcal S\vert -\Vert \mathbf x^{*}\Vert _0)+{\varGamma }\). Combined with the results from the first part of this lemma, we have the claimed result in the second part. \(\square \)

Lemma 8

Consider a subgaussian \(\tilde{n}\)-dimensional random vector \(\tilde{W}\) in \(\mathfrak {R}^{\tilde{n}}\) that satisfies \(Prob[\vert \langle \tilde{W},\, \upsilon \rangle \vert \ge t]\le 2\exp \left( -{t^2}(2\sigma ^2)^{-1}\right) \). for any \(\upsilon \in \mathfrak {R}^{\tilde{n}}:\, \Vert \upsilon \Vert =1\), then for any \(V\in \mathfrak {R}^{\tilde{n}\times \tilde{n}}\) and \({\varSigma }_{v}=V^\top V\), \( Prob[\Vert V \tilde{W}\Vert ^2\le \sigma ^2(\mathbf{Tr}({\varSigma }_v)+2\sqrt{\mathbf{Tr}({\varSigma }_v^2)t}+2\Vert {\varSigma }_v\Vert t)]\ge 1-\exp (-t)\) for any \(t>0\), where \(\mathbf{Tr}(\cdot )\) denotes the trace of a matrix.

Proof

Evident from Theorem 2.1 in [17]. \(\square \)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Liu, H., Yao, T., Li, R. et al. Folded concave penalized sparse linear regression: sparsity, statistical performance, and algorithmic theory for local solutions. Math. Program. 166, 207–240 (2017). https://doi.org/10.1007/s10107-017-1114-y

Download citation

Received: 23 February 2015
Accepted: 24 January 2017
Published: 10 February 2017
Issue Date: November 2017
DOI: https://doi.org/10.1007/s10107-017-1114-y

Keywords

Mathematics Subject Classification

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Folded concave penalized sparse linear regression: sparsity, statistical performance, and algorithmic theory for local solutions

Abstract

Access this article

Similar content being viewed by others

Comparing solution paths of sparse quadratic minimization with a Stieltjes matrix

Linear-step solvability of some folded concave and singly-parametric sparse optimization problems

Fast and scalable Lasso via stochastic Frank–Wolfe methods with a convergence guarantee

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Electronic supplementary material

Supplementary material 1 (pdf 136 KB)

Appendix

1.1 Some useful Lemmas

Lemma 6

Proof

Lemma 7

Proof

Lemma 8

Proof

Rights and permissions

About this article

Cite this article

Keywords

Mathematics Subject Classification

Navigation

Folded concave penalized sparse linear regression: sparsity, statistical performance, and algorithmic theory for local solutions

Abstract

Access this article

Similar content being viewed by others

Comparing solution paths of sparse quadratic minimization with a Stieltjes matrix

Linear-step solvability of some folded concave and singly-parametric sparse optimization problems

Fast and scalable Lasso via stochastic Frank–Wolfe methods with a convergence guarantee

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Electronic supplementary material

Supplementary material 1 (pdf 136 KB)

Appendix

Appendix

1.1 Some useful Lemmas

Lemma 6

Proof

Lemma 7

Proof

Lemma 8

Proof

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Mathematics Subject Classification

Search

Navigation