Nonsmoothness in Machine Learning: Specific Structure, Proximal Identification, and Applications

Iutzeler, Franck; Malick, Jérôme

doi:10.1007/s11228-020-00561-1

Nonsmoothness in Machine Learning: Specific Structure, Proximal Identification, and Applications

Published: 19 October 2020

Volume 28, pages 661–678, (2020)
Cite this article

Set-Valued and Variational Analysis Aims and scope Submit manuscript

405 Accesses
7 Citations
2 Altmetric
Explore all metrics

Abstract

Nonsmoothness is often a curse for optimization; but it is sometimes a blessing, in particular for applications in machine learning. In this paper, we present the specific structure of nonsmooth optimization problems appearing in machine learning and illustrate how to leverage this structure in practice, for compression, acceleration, or dimension reduction. We pay a special attention to the presentation to make it concise and easily accessible, with both simple examples and general results.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Nonlinear Optimization: A Brief Overview

Geometric Optimization in Machine Learning

An Overview of Mathematical Methods for Numerical Optimization

References

Absil, P.A., Mahony, R., Sepulchre, R.: Optimization Algorithms on matrix manifolds. Princeton University Press, Princeton (2008)
Bach, F.: Consistency of trace norm minimization. J. Mach. Learn. Res. 9, 1019–1048 (2008)
MathSciNet MATH Google Scholar
Bach, F., Jenatton, R., Mairal, J., Obozinski, G., et al.: Convex optimization with sparsity-inducing norms. Optim. Mach. Learn. 5, 19–53 (2011)
MATH Google Scholar
Bach, F., Jenatton, R., Mairal, J., Obozinski, G., et al.: Optimization with sparsity-inducing penalties. Found. Trends®; Mach. Learn. 4(1), 1–106 (2012)
MATH Google Scholar
Bareilles, G., Iutzeler, F.: On the interplay between acceleration and identification for the proximal gradient algorithm. Computational Optimization and Applications (2020)
Bauschke, H.H., Combettes, P.L.: Convex analysis and monotone operator theory in Hilbert spaces. Springer Science & Business Media (2011)
Bauschke, H.H., Goebel, R., Lucet, Y., Wang, X.: The proximal average: basic theory. SIAM J. Optim. 19(2), 766–785 (2008)
MathSciNet MATH Google Scholar
Beck, A.: First-order methods in optimization, vol. 25. SIAM (2017)
Benfenati, A., Chouzenoux, E., Pesquet, J.C.: A proximal approach for a class of matrix optimization problems. arXiv:1801.07452 (2018)
Candès, E.J., Romberg, J., Tao, T.: Robust uncertainty principles: Exact signal reconstruction from highly incomplete frequency information. IEEE Trans. Inf. Theory 52(2), 489–509 (2006)
MathSciNet MATH Google Scholar
Chambolle, A., Dossal, C.: On the convergence of the iterates of the “fast iterative shrinkage/thresholding algorithm”. J. Optim. Theory Appl. 166 (3), 968–982 (2015)
MathSciNet MATH Google Scholar
Chartrand, R., Yin, W.: Nonconvex Sparse Regularization and Splitting Algorithms. In: Splitting Methods in Communication, Imaging, Science, and Engineering, pp. 237–249. Springer (2016)
Combettes, P.L., Pesquet, J.C.: Proximal Splitting Methods in Signal Processing. In: Fixed-Point Algorithms for Inverse Problems in Science and Engineering, pp. 185–212. Springer (2011)
Condat, L.: A direct algorithm for 1-d total variation denoising. IEEE Signal Process. Lett. 20(11), 1054–1057 (2013)
Google Scholar
Condat, L.: Discrete total variation: New definition and minimization. SIAM J. Imaging Sci. 10(3), 1258–1290 (2017)
MathSciNet MATH Google Scholar
Daniilidis, A., Hare, W., Malick, J.: Geometrical interpretation of the predictor-corrector type algorithms in structured optimization problems. Optimization 55(5-6), 481–503 (2006)
MathSciNet MATH Google Scholar
Defazio, A., Bach, F., Lacoste-Julien, S.: Saga: a Fast Incremental Gradient Method with Support for Non-Strongly Convex Composite Objectives. In: Advances in Neural Information Processing Systems, pp. 1646–1654 (2014)
Donoho, D.L.: Compressed sensing. IEEE Trans. Inf. Theory 52 (4), 1289–1306 (2006)
MathSciNet MATH Google Scholar
Duval, V., Peyré, G.: Sparse regularization on thin grids i: the lasso. Inverse Probl. 33(5), 055008 (2017)
Eckstein, J., Bertsekas, D.P.: On the douglas—rachford splitting method and the proximal point algorithm for maximal monotone operators. Math. Program. 55(1-3), 293–318 (1992)
MathSciNet MATH Google Scholar
Fadili, J., Malick, J., Peyré, G.: Sensitivity analysis for mirror-stratifiable convex functions. SIAM J. Optim. 28(4), 2975–3000 (2018)
MathSciNet MATH Google Scholar
Fadili, J.M., Garrigos, G., Malick, J., Peyré, G.: Model Consistency for Learning with Mirror-Stratifiable Regularizers. In: International Conference on Artificial Intelligence and Statistics (AISTATS) (2019)
Fercoq, O., Gramfort, A., Salmon, J.: Mind the Duality Gap: Safer Rules for the Lasso. In: International Conference on Machine Learning, pp. 333–342 (2015)
Friedrich, F., Kempe, A., Liebscher, V., Winkler, G.: Complexity penalized m-estimation: fast computation. J. Comput. Graph. Stat. 17(1), 201–224 (2008)
MathSciNet Google Scholar
Garrigos, G., Rosasco, L., Villa, S.: Thresholding gradient methods in Hilbert spaces: support identification and linear convergence. arXiv:1801.074521712.00357 (2017)
Ghaoui, L.E., Viallon, V., Rabbani, T.: Safe feature elimination for the lasso and sparse supervised learning problems. Pac. J. Optim. 8(4), 667–698 (2012)
MathSciNet MATH Google Scholar
Grishchenko, D., Iutzeler, F., Malick, J.: Proximal gradient methods with adaptive subspace sampling. Mathematics of Operations Research (2020)
Grishchenko, D., Iutzeler, F., Malick, J., Amini, M.R.: Asynchronous distributed learning with sparse communications and identification. arXiv:1812.03871 (2018)
Hare, W., Lewis, A.S.: Identifying active constraints via partial smoothness and prox-regularity. J. Convex Anal. 11(2), 251–266 (2004)
MathSciNet MATH Google Scholar
Hare, W., Sagastizábal, C.: Computing proximal points of nonconvex functions. Math. Program. 116(1-2), 221–258 (2009)
MathSciNet MATH Google Scholar
Hiriart-Urruty, J.B., Lemaréchal, C.: Convex Analysis and Minimization Algorithms, vol. 2. Springer, Verlag (1993)
Lemaréchal, C., Oustry, F., Sagastizábal, C.: The \(\mathcal {U}\)-Lagrangianof a convex function. Trans. AMS 352(2), 711–729 (2000)
MATH Google Scholar
Liang, J., Fadili, J., Peyré, G.: Activity identification and local linear convergence of forward–backward-type methods. SIAM J. Optim. 27(1), 408–437 (2017)
MathSciNet MATH Google Scholar
Mairal, J.: Sparse coding for machine learning, image processing and computer vision. Ph.D. Thesis, École Normale supérieure de Cachan (2010)
Massias, M., Salmon, J., Gramfort, A.: Celer: a Fast Solver for the Lasso with Dual Extrapolation. In: International Conference on Machine Learning, pp. 3321–3330 (2018)
Mifflin, R., Sagastizábal, C.: A \(\mathcal {{{VU}}}\)-algorithm for convex minimization. Math. Programm. 104(2-3), 583–608 (2005)
MathSciNet MATH Google Scholar
Miller, S. A., Malick, J.: Newton methods for nonsmooth convex minimization: connections among-lagrangian, riemannian newton and sqp methods. Math. Programm. 104(2-3), 609–633 (2005)
MathSciNet MATH Google Scholar
Mishchenko, K., Iutzeler, F., Malick, J.: A distributed flexible delay-tolerant proximal gradient algorithm. SIAM J. Optim. 30(1), 933–959 (2020)
MathSciNet MATH Google Scholar
Mishchenko, K., Iutzeler, F., Malick, J., Amini, M.R.: A Delay-Tolerant Proximal-Gradient Algorithm for Distributed Learning. In: International Conference on Machine Learning, pp. 3584–3592 (2018)
Ndiaye, E., Fercoq, O., Gramfort, A., Salmon, J.: Gap safe screening rules for sparsity enforcing penalties. J. Mach. Learn. Res. 18(1), 4671–4703 (2017)
MathSciNet MATH Google Scholar
Nesterov, Y.E.: A Method for Solving the Convex Programming Problem with Convergence Rate O(1/K²). In: Dokl. Akad. Nauk SSSR, vol. 269, pp. 543–547 (1983)
Nutini, J., Schmidt, M., Hare, W.: “active-set complexity” of proximal gradient: How long does it take to find the sparsity pattern? Optim. Lett. 13(4), 645–655 (2019)
MathSciNet MATH Google Scholar
Parikh, N., Boyd, S.P.: Proximal algorithms. Found. Trends Optim. 1(3), 127–239 (2014)
Google Scholar
Poon, C., Liang, J., Schönlieb, C.B.: Local convergence properties of saga/prox-svrg and acceleration. arXiv:1802.02554 (2018)
Rockafellar, R., Wets, R.B.: Variational Analysis. Springer, Heidelberg (1998)
MATH Google Scholar
Rockafellar, R.T.: Monotone operators and the proximal point algorithm. SIAM J. Control Optim. 14(5), 877–898 (1976)
MathSciNet MATH Google Scholar
Rudin, L.I., Osher, S., Fatemi, E.: Nonlinear total variation based noise removal algorithms. Physica D: Nonlinear Phenom. 60(1-4), 259–268 (1992)
MathSciNet MATH Google Scholar
Shalev-Shwartz, S., Ben-David, S.: Understanding machine learning: From theory to algorithms. Cambridge University Press, Cambridge (2014)
Stewart, G.W.: Perturbation theory for the singular value decomposition. Technical report (1998)
Sun, Y., Jeong, H., Nutini, J., Schmidt, M.: Are We There Yet? Manifold Identification of Gradient-Related Proximal Methods. In: The 22Nd International Conference on Artificial Intelligence and Statistics, pp. 1110–1119 (2019)
Tibshirani, R.: Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society. Series B (Methodological):267–288 (1996)
Tibshirani, R., Bien, J., Friedman, J., Hastie, T., Simon, N., Taylor, J., Tibshirani, R.J.: Strong rules for discarding predictors in lasso-type problems. J. R. Stat. Soc. Ser. B (Stat. Methodol.) 74(2), 245–266 (2012)
MathSciNet MATH Google Scholar
Vaiter, S., Peyré, G., Fadili, J.: Low Complexity Regularization of Linear Inverse Problems. In: Sampling Theory, a Renaissance, pp. 103–153. Springer (2015)
Weinmann, A., Storath, M., Demaret, L.: The l¹-Potts functional for robust jump-sparse reconstruction. SIAM J. Numer. Anal. 53(1), 644–673 (2015)
MathSciNet MATH Google Scholar
Weyl, H.: Das asymptotische Verteilungsgesetz der Eigenwerte linearer partieller Differentialgleichungen (mit einer Anwendung auf die Theorie der Hohlraumstrahlung). Math. Ann. 71(4), 441–479 (1912)
MathSciNet MATH Google Scholar
Wright, S.J.: Identifiable surfaces in constrained optimization. SIAM J. Control. Optim. 31(4), 1063–1079 (1993)
MathSciNet MATH Google Scholar
Yu, Y.L.: On Decomposing the Proximal Map. In: Advances in Neural Information Processing Systems, pp. 91–99 (2013)
Zhao, P., Yu, B.: On model selection consistency of Lasso. J. Mach. Learn. Res. 7, 2541–2563 (2006)
MathSciNet MATH Google Scholar

Download references

Acknowledgments

The authors would like to warmly thank the whole DAO team and especially our PhD students Gilles Bareilles, Mathias Chastan, Sélim Chraibi, Dmitry Grishchenko, Yu-Guan Hsieh, and Yassine Laguel. FI benefited from the support of the ANR JCJC project STROLL (ANR-19-CE23-0008). This work has been partially supported by MIAI@Grenoble Alpes (ANR-19-P3IA-0003).

Author information

Authors and Affiliations

University of Grenoble Alpes, Grenoble, France
Franck Iutzeler
CNRS, LJK, Grenoble, France
Jérôme Malick

Authors

Franck Iutzeler
View author publications
You can also search for this author in PubMed Google Scholar
Jérôme Malick
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Franck Iutzeler.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Iutzeler, F., Malick, J. Nonsmoothness in Machine Learning: Specific Structure, Proximal Identification, and Applications. Set-Valued Var. Anal 28, 661–678 (2020). https://doi.org/10.1007/s11228-020-00561-1

Download citation

Received: 07 April 2020
Accepted: 30 September 2020
Published: 19 October 2020
Issue Date: December 2020
DOI: https://doi.org/10.1007/s11228-020-00561-1

Keywords

Mathematics Subject Classification (2010)

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Nonsmoothness in Machine Learning: Specific Structure, Proximal Identification, and Applications

Abstract

Access this article

Similar content being viewed by others

Nonlinear Optimization: A Brief Overview

Geometric Optimization in Machine Learning

An Overview of Mathematical Methods for Numerical Optimization

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s Note

Rights and permissions

About this article

Cite this article

Keywords

Mathematics Subject Classification (2010)

Navigation

Nonsmoothness in Machine Learning: Specific Structure, Proximal Identification, and Applications

Abstract

Access this article

Similar content being viewed by others

Nonlinear Optimization: A Brief Overview

Geometric Optimization in Machine Learning

An Overview of Mathematical Methods for Numerical Optimization

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Mathematics Subject Classification (2010)

Search

Navigation