Skip to main content
Log in

Nonsmoothness in Machine Learning: Specific Structure, Proximal Identification, and Applications

  • Published:
Set-Valued and Variational Analysis Aims and scope Submit manuscript

Abstract

Nonsmoothness is often a curse for optimization; but it is sometimes a blessing, in particular for applications in machine learning. In this paper, we present the specific structure of nonsmooth optimization problems appearing in machine learning and illustrate how to leverage this structure in practice, for compression, acceleration, or dimension reduction. We pay a special attention to the presentation to make it concise and easily accessible, with both simple examples and general results.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Similar content being viewed by others

References

  1. Absil, P.A., Mahony, R., Sepulchre, R.: Optimization Algorithms on matrix manifolds. Princeton University Press, Princeton (2008)

  2. Bach, F.: Consistency of trace norm minimization. J. Mach. Learn. Res. 9, 1019–1048 (2008)

    MathSciNet  MATH  Google Scholar 

  3. Bach, F., Jenatton, R., Mairal, J., Obozinski, G., et al.: Convex optimization with sparsity-inducing norms. Optim. Mach. Learn. 5, 19–53 (2011)

    MATH  Google Scholar 

  4. Bach, F., Jenatton, R., Mairal, J., Obozinski, G., et al.: Optimization with sparsity-inducing penalties. Found. Trends®; Mach. Learn. 4(1), 1–106 (2012)

    MATH  Google Scholar 

  5. Bareilles, G., Iutzeler, F.: On the interplay between acceleration and identification for the proximal gradient algorithm. Computational Optimization and Applications (2020)

  6. Bauschke, H.H., Combettes, P.L.: Convex analysis and monotone operator theory in Hilbert spaces. Springer Science & Business Media (2011)

  7. Bauschke, H.H., Goebel, R., Lucet, Y., Wang, X.: The proximal average: basic theory. SIAM J. Optim. 19(2), 766–785 (2008)

    MathSciNet  MATH  Google Scholar 

  8. Beck, A.: First-order methods in optimization, vol. 25. SIAM (2017)

  9. Benfenati, A., Chouzenoux, E., Pesquet, J.C.: A proximal approach for a class of matrix optimization problems. arXiv:1801.07452 (2018)

  10. Candès, E.J., Romberg, J., Tao, T.: Robust uncertainty principles: Exact signal reconstruction from highly incomplete frequency information. IEEE Trans. Inf. Theory 52(2), 489–509 (2006)

    MathSciNet  MATH  Google Scholar 

  11. Chambolle, A., Dossal, C.: On the convergence of the iterates of the “fast iterative shrinkage/thresholding algorithm”. J. Optim. Theory Appl. 166 (3), 968–982 (2015)

    MathSciNet  MATH  Google Scholar 

  12. Chartrand, R., Yin, W.: Nonconvex Sparse Regularization and Splitting Algorithms. In: Splitting Methods in Communication, Imaging, Science, and Engineering, pp. 237–249. Springer (2016)

  13. Combettes, P.L., Pesquet, J.C.: Proximal Splitting Methods in Signal Processing. In: Fixed-Point Algorithms for Inverse Problems in Science and Engineering, pp. 185–212. Springer (2011)

  14. Condat, L.: A direct algorithm for 1-d total variation denoising. IEEE Signal Process. Lett. 20(11), 1054–1057 (2013)

    Google Scholar 

  15. Condat, L.: Discrete total variation: New definition and minimization. SIAM J. Imaging Sci. 10(3), 1258–1290 (2017)

    MathSciNet  MATH  Google Scholar 

  16. Daniilidis, A., Hare, W., Malick, J.: Geometrical interpretation of the predictor-corrector type algorithms in structured optimization problems. Optimization 55(5-6), 481–503 (2006)

    MathSciNet  MATH  Google Scholar 

  17. Defazio, A., Bach, F., Lacoste-Julien, S.: Saga: a Fast Incremental Gradient Method with Support for Non-Strongly Convex Composite Objectives. In: Advances in Neural Information Processing Systems, pp. 1646–1654 (2014)

  18. Donoho, D.L.: Compressed sensing. IEEE Trans. Inf. Theory 52 (4), 1289–1306 (2006)

    MathSciNet  MATH  Google Scholar 

  19. Duval, V., Peyré, G.: Sparse regularization on thin grids i: the lasso. Inverse Probl. 33(5), 055008 (2017)

  20. Eckstein, J., Bertsekas, D.P.: On the douglas—rachford splitting method and the proximal point algorithm for maximal monotone operators. Math. Program. 55(1-3), 293–318 (1992)

    MathSciNet  MATH  Google Scholar 

  21. Fadili, J., Malick, J., Peyré, G.: Sensitivity analysis for mirror-stratifiable convex functions. SIAM J. Optim. 28(4), 2975–3000 (2018)

    MathSciNet  MATH  Google Scholar 

  22. Fadili, J.M., Garrigos, G., Malick, J., Peyré, G.: Model Consistency for Learning with Mirror-Stratifiable Regularizers. In: International Conference on Artificial Intelligence and Statistics (AISTATS) (2019)

  23. Fercoq, O., Gramfort, A., Salmon, J.: Mind the Duality Gap: Safer Rules for the Lasso. In: International Conference on Machine Learning, pp. 333–342 (2015)

  24. Friedrich, F., Kempe, A., Liebscher, V., Winkler, G.: Complexity penalized m-estimation: fast computation. J. Comput. Graph. Stat. 17(1), 201–224 (2008)

    MathSciNet  Google Scholar 

  25. Garrigos, G., Rosasco, L., Villa, S.: Thresholding gradient methods in Hilbert spaces: support identification and linear convergence. arXiv:1801.074521712.00357 (2017)

  26. Ghaoui, L.E., Viallon, V., Rabbani, T.: Safe feature elimination for the lasso and sparse supervised learning problems. Pac. J. Optim. 8(4), 667–698 (2012)

    MathSciNet  MATH  Google Scholar 

  27. Grishchenko, D., Iutzeler, F., Malick, J.: Proximal gradient methods with adaptive subspace sampling. Mathematics of Operations Research (2020)

  28. Grishchenko, D., Iutzeler, F., Malick, J., Amini, M.R.: Asynchronous distributed learning with sparse communications and identification. arXiv:1812.03871 (2018)

  29. Hare, W., Lewis, A.S.: Identifying active constraints via partial smoothness and prox-regularity. J. Convex Anal. 11(2), 251–266 (2004)

    MathSciNet  MATH  Google Scholar 

  30. Hare, W., Sagastizábal, C.: Computing proximal points of nonconvex functions. Math. Program. 116(1-2), 221–258 (2009)

    MathSciNet  MATH  Google Scholar 

  31. Hiriart-Urruty, J.B., Lemaréchal, C.: Convex Analysis and Minimization Algorithms, vol. 2. Springer, Verlag (1993)

  32. Lemaréchal, C., Oustry, F., Sagastizábal, C.: The \(\mathcal {U}\)-Lagrangianof a convex function. Trans. AMS 352(2), 711–729 (2000)

    MATH  Google Scholar 

  33. Liang, J., Fadili, J., Peyré, G.: Activity identification and local linear convergence of forward–backward-type methods. SIAM J. Optim. 27(1), 408–437 (2017)

    MathSciNet  MATH  Google Scholar 

  34. Mairal, J.: Sparse coding for machine learning, image processing and computer vision. Ph.D. Thesis, École Normale supérieure de Cachan (2010)

  35. Massias, M., Salmon, J., Gramfort, A.: Celer: a Fast Solver for the Lasso with Dual Extrapolation. In: International Conference on Machine Learning, pp. 3321–3330 (2018)

  36. Mifflin, R., Sagastizábal, C.: A \(\mathcal {{{VU}}}\)-algorithm for convex minimization. Math. Programm. 104(2-3), 583–608 (2005)

    MathSciNet  MATH  Google Scholar 

  37. Miller, S. A., Malick, J.: Newton methods for nonsmooth convex minimization: connections among-lagrangian, riemannian newton and sqp methods. Math. Programm. 104(2-3), 609–633 (2005)

    MathSciNet  MATH  Google Scholar 

  38. Mishchenko, K., Iutzeler, F., Malick, J.: A distributed flexible delay-tolerant proximal gradient algorithm. SIAM J. Optim. 30(1), 933–959 (2020)

    MathSciNet  MATH  Google Scholar 

  39. Mishchenko, K., Iutzeler, F., Malick, J., Amini, M.R.: A Delay-Tolerant Proximal-Gradient Algorithm for Distributed Learning. In: International Conference on Machine Learning, pp. 3584–3592 (2018)

  40. Ndiaye, E., Fercoq, O., Gramfort, A., Salmon, J.: Gap safe screening rules for sparsity enforcing penalties. J. Mach. Learn. Res. 18(1), 4671–4703 (2017)

    MathSciNet  MATH  Google Scholar 

  41. Nesterov, Y.E.: A Method for Solving the Convex Programming Problem with Convergence Rate O(1/K2). In: Dokl. Akad. Nauk SSSR, vol. 269, pp. 543–547 (1983)

  42. Nutini, J., Schmidt, M., Hare, W.: “active-set complexity” of proximal gradient: How long does it take to find the sparsity pattern? Optim. Lett. 13(4), 645–655 (2019)

    MathSciNet  MATH  Google Scholar 

  43. Parikh, N., Boyd, S.P.: Proximal algorithms. Found. Trends Optim. 1(3), 127–239 (2014)

    Google Scholar 

  44. Poon, C., Liang, J., Schönlieb, C.B.: Local convergence properties of saga/prox-svrg and acceleration. arXiv:1802.02554 (2018)

  45. Rockafellar, R., Wets, R.B.: Variational Analysis. Springer, Heidelberg (1998)

    MATH  Google Scholar 

  46. Rockafellar, R.T.: Monotone operators and the proximal point algorithm. SIAM J. Control Optim. 14(5), 877–898 (1976)

    MathSciNet  MATH  Google Scholar 

  47. Rudin, L.I., Osher, S., Fatemi, E.: Nonlinear total variation based noise removal algorithms. Physica D: Nonlinear Phenom. 60(1-4), 259–268 (1992)

    MathSciNet  MATH  Google Scholar 

  48. Shalev-Shwartz, S., Ben-David, S.: Understanding machine learning: From theory to algorithms. Cambridge University Press, Cambridge (2014)

  49. Stewart, G.W.: Perturbation theory for the singular value decomposition. Technical report (1998)

  50. Sun, Y., Jeong, H., Nutini, J., Schmidt, M.: Are We There Yet? Manifold Identification of Gradient-Related Proximal Methods. In: The 22Nd International Conference on Artificial Intelligence and Statistics, pp. 1110–1119 (2019)

  51. Tibshirani, R.: Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society. Series B (Methodological):267–288 (1996)

  52. Tibshirani, R., Bien, J., Friedman, J., Hastie, T., Simon, N., Taylor, J., Tibshirani, R.J.: Strong rules for discarding predictors in lasso-type problems. J. R. Stat. Soc. Ser. B (Stat. Methodol.) 74(2), 245–266 (2012)

    MathSciNet  MATH  Google Scholar 

  53. Vaiter, S., Peyré, G., Fadili, J.: Low Complexity Regularization of Linear Inverse Problems. In: Sampling Theory, a Renaissance, pp. 103–153. Springer (2015)

  54. Weinmann, A., Storath, M., Demaret, L.: The l1-Potts functional for robust jump-sparse reconstruction. SIAM J. Numer. Anal. 53(1), 644–673 (2015)

    MathSciNet  MATH  Google Scholar 

  55. Weyl, H.: Das asymptotische Verteilungsgesetz der Eigenwerte linearer partieller Differentialgleichungen (mit einer Anwendung auf die Theorie der Hohlraumstrahlung). Math. Ann. 71(4), 441–479 (1912)

    MathSciNet  MATH  Google Scholar 

  56. Wright, S.J.: Identifiable surfaces in constrained optimization. SIAM J. Control. Optim. 31(4), 1063–1079 (1993)

    MathSciNet  MATH  Google Scholar 

  57. Yu, Y.L.: On Decomposing the Proximal Map. In: Advances in Neural Information Processing Systems, pp. 91–99 (2013)

  58. Zhao, P., Yu, B.: On model selection consistency of Lasso. J. Mach. Learn. Res. 7, 2541–2563 (2006)

    MathSciNet  MATH  Google Scholar 

Download references

Acknowledgments

The authors would like to warmly thank the whole DAO team and especially our PhD students Gilles Bareilles, Mathias Chastan, Sélim Chraibi, Dmitry Grishchenko, Yu-Guan Hsieh, and Yassine Laguel. FI benefited from the support of the ANR JCJC project STROLL (ANR-19-CE23-0008). This work has been partially supported by MIAI@Grenoble Alpes (ANR-19-P3IA-0003).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Franck Iutzeler.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Iutzeler, F., Malick, J. Nonsmoothness in Machine Learning: Specific Structure, Proximal Identification, and Applications. Set-Valued Var. Anal 28, 661–678 (2020). https://doi.org/10.1007/s11228-020-00561-1

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11228-020-00561-1

Keywords

Mathematics Subject Classification (2010)

Navigation