Skip to main content
Log in

Convergence rates for an inertial algorithm of gradient type associated to a smooth non-convex minimization

  • Full Length Paper
  • Series A
  • Published:
Mathematical Programming Submit manuscript

Abstract

We investigate an inertial algorithm of gradient type in connection with the minimization of a non-convex differentiable function. The algorithm is formulated in the spirit of Nesterov’s accelerated convex gradient method. We prove some abstract convergence results which applied to our numerical scheme allow us to show that the generated sequences converge to a critical point of the objective function, provided a regularization of the objective function satisfies the Kurdyka–Łojasiewicz property. Further, we obtain convergence rates for the generated sequences and the objective function values formulated in terms of the Łojasiewicz exponent of a regularization of the objective function. Finally, some numerical experiments are presented in order to compare our numerical scheme and some algorithms well known in the literature.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

References

  1. Alvarez, F., Attouch, H.: An inertial proximal method for maximal monotone operators via discretization of a nonlinear oscillator with damping. Set Valued Anal. 9, 3–11 (2001)

    MathSciNet  MATH  Google Scholar 

  2. Apidopoulos, V., Aujol, J.F., Dossal, Ch.: Convergence rate of inertial Forward–Backward algorithm beyond Nesterov’s rule. Math. Program. 180, 137–156 (2020)

    MathSciNet  MATH  Google Scholar 

  3. Attouch, H., Bolte, J.: On the convergence of the proximal algorithm for nonsmooth functions involving analytic features. Math. Program. Ser. B 116(1–2), 5–16 (2009)

    MathSciNet  MATH  Google Scholar 

  4. Attouch, H., Bolte, J., Redont, P., Soubeyran, A.: Proximal alternating minimization and projection methods for non-convex problems: an approach based on the Kurdyka–Łojasiewicz inequality. Math. Oper. Res. 35(2), 438–457 (2010)

    MathSciNet  MATH  Google Scholar 

  5. Attouch, H., Bolte, J., Svaiter, B.F.: Convergence of descent methods for semi-algebraic and tame problems: proximal algorithms, forward–backward splitting, and regularized Gauss–Seidel methods. Math. Program. 137(1–2), 91–129 (2013)

    MathSciNet  MATH  Google Scholar 

  6. Attouch, H., Chbani, Z., Peypouquet, J., Redont, P.: Fast convergence of inertial dynamics and algorithms with asymptotic vanishing viscosity. Math. Program. Ser. B 168(1–2), 123–175 (2018)

    MathSciNet  MATH  Google Scholar 

  7. Attouch, H., Chbani, Z., Riahi, H.: Rate of convergence of the Nesterov accelerated gradient method in the subcritical case \(\alpha \le 3\), ESAIM: COCV, 25, Article number 2 (2019)

  8. Attouch, H., Goudou, X., Redont, P.: The heavy ball with friction method, I. The continuous dynamical system: global exploration of the local minima of real-valued function by asymptotic analysis of a dissipative dynamical system. Commun. Contemp. Math. 2(1), 1–34 (2000)

    MathSciNet  MATH  Google Scholar 

  9. Attouch, H., Peypouquet, J., Redont, P.: A dynamical approach to an inertial forward–backward algorithm for convex minimization. SIAM J. Optim. 24(1), 232–256 (2014)

    MathSciNet  MATH  Google Scholar 

  10. Attouch, H., Peypouquet, J., Redont, P.: Fast convex optimization via inertial dynamics with Hessian driven damping. J. Differ. Equ. 261(10), 5734–5783 (2016)

    MathSciNet  MATH  Google Scholar 

  11. Aujol, J.F., Dossal, Ch., Rondepierre, A.: Optimal convergence rates for Nesterov acceleration. SIAM J. Optim. 29(4), 3131–3153 (2019)

    MathSciNet  MATH  Google Scholar 

  12. Aujol, J.F., Dossal, C.: Optimal rate of convergence of an ODE associated to the Fast Gradient Descent schemes for \(b > 0\). HAL preprint https://hal.inria.fr/hal-01547251v2/document

  13. Bauschke, H.H., Combettes, P.L.: Convex Analysis and Monotone Operator Theory in Hilbert Spaces. Springer, New York (2011)

    MATH  Google Scholar 

  14. Beck, A., Teboulle, M.: A fast iterative shrinkage-thresholding algorithm for linear inverse problems. SIAM J. Imaging Sci. 2(1), 183–202 (2009)

    MathSciNet  MATH  Google Scholar 

  15. Bégout, P., Bolte, J., Jendoubi, M.A.: On damped second-order gradient systems. J. Differ. Equ. 259, 3115–3143 (2015)

    MathSciNet  MATH  Google Scholar 

  16. Bolte, J., Sabach, S., Teboulle, M.: Proximal alternating linearized minimization for non-convex and nonsmooth problems. Math. Program. 146(1–2), 459–494 (2014)

    MathSciNet  MATH  Google Scholar 

  17. Bolte, J., Nguyen, T.P., Peypouquet, J., Suter, B.W.: From error bounds to the complexity of first-order descent methods for convex functions. Math. Program. 165(2), 471–507 (2017)

    MathSciNet  MATH  Google Scholar 

  18. Bolte, J., Daniilidis, A., Lewis, A.: The Łojasiewicz inequality for nonsmooth subanalytic functions with applications to subgradient dynamical systems. SIAM J. Optim. 17(4), 1205–1223 (2006)

    MathSciNet  MATH  Google Scholar 

  19. Bolte, J., Daniilidis, A., Lewis, A., Shiota, M.: Clarke subgradients of stratifiable functions. SIAM J. Optim. 18(2), 556–572 (2007)

    MathSciNet  MATH  Google Scholar 

  20. Bolte, J., Daniilidis, A., Ley, O., Mazet, L.: Characterizations of Łojasiewicz inequalities: subgradient flows, talweg, convexity. Trans. Am. Math. Soc. 362(6), 3319–3363 (2010)

    MATH  Google Scholar 

  21. Boţ, R.I., Csetnek, E.R.: Approaching nonsmooth non-convex optimization problems through first order dynamical systems with hidden acceleration and Hessian driven damping terms. Set Valued Var. Anal. 26, 227–245 (2018)

    MathSciNet  MATH  Google Scholar 

  22. Boţ, R.I., Csetnek, E.R.: A forward-backward dynamical approach to the minimization of the sum of a nonsmooth convex with a smooth non-convex function. ESAIM COCV 24(2), 463–477 (2018)

    MATH  Google Scholar 

  23. Boţ, R.I., Csetnek, E.R.: Newton-like dynamics associated to non-convex optimization problems. In: Hosseini, S., Mordukhovich, B., Uschmajew, A. (eds.) Nonsmooth Optimization and Its Applications, International Series of Numerical Mathematics, vol. 170, pp. 131–149. Birkhäuser, Cham (2019)

    MATH  Google Scholar 

  24. Boţ, R.I., Csetnek, E.R., Hendrich, C.: Inertial Douglas–Rachford splitting for monotone inclusion problems. Appl. Math. Comput. 256, 472–487 (2015)

    MathSciNet  MATH  Google Scholar 

  25. Boţ, R.I., Csetnek, E.R., László, S.C.: Approaching nonsmooth non-convex minimization through second-order proximal-gradient dynamical systems. J. Evol. Equ. 18(3), 1291–1318 (2018)

    MathSciNet  MATH  Google Scholar 

  26. Boţ, R.I., Csetnek, E.R., László, S.C.: An inertial forward–backward algorithm for minimizing the sum of two non-convex functions. EURO J. Comput. Optim. 4(1), 3–25 (2016)

    MathSciNet  MATH  Google Scholar 

  27. Boţ, R.I., Csetnek, E.R., László, S.C.: A second order dynamical approach with variable damping to non-convex smooth minimization. Appl. Anal. 99(3), 361–378 (2020)

    MathSciNet  MATH  Google Scholar 

  28. Boţ, R.I., Nguyen, D.K.: The proximal alternating direction method of multipliers in the non-convex setting: convergence analysis and rates. arXiv:1801.01994

  29. Chambolle, A., Dossal, Ch.: On the convergence of the iterates of the “fast iterative shrinkage/thresholding algorithm”. J. Optim. Theory Appl. 166(3), 968–982 (2015)

    MathSciNet  MATH  Google Scholar 

  30. Chill, R.: On the Łojasiewicz–Simon gradient inequality. J. Funct. Anal. 201, 572–601 (2003)

    MathSciNet  MATH  Google Scholar 

  31. Chouzenoux, E., Pesquet, J.C., Repetti, A.: Variable metric forward–backward algorithm for minimizing the sum of a differentiable function and a convex function. J. Optim. Theory Appl. 162(1), 107–132 (2014)

    MathSciNet  MATH  Google Scholar 

  32. Combettes, P.L., Glaudin, L.E.: Quasinonexpansive iterations on the affine hull of orbits: from Mann’s mean value algorithm to inertial methods. SIAM J. Optim. 27(4), 2356–2380 (2017)

    MathSciNet  MATH  Google Scholar 

  33. van den Dries, L., Miller, C.: Geometric categories and o-minimal structures. Duke Math. J. 84(2), 497–540 (1996)

    MathSciNet  MATH  Google Scholar 

  34. Frankel, P., Garrigos, G., Peypouquet, J.: Splitting methods with variable metric for Kurdyka–Łojasiewicz functions and general convergence rates. J. Optim. Theory Appl. 165(3), 874–900 (2015)

    MathSciNet  MATH  Google Scholar 

  35. Garrigos, G., Rosasco, L., Villa, S.: Convergence of the Forward-Backward algorithm: beyond the worst-case with the help of geometry, https://arxiv.org/pdf/1703.09477.pdf

  36. Ghadimi, E., Feyzmahdavian, H.R., Johansson, M.: Global convergence of the heavy-ball method for convex optimization. In: 2015 European Control Conference (ECC). IEEE, pp. 310–315 (2015)

  37. Haraux, A., Jendoubi, M.: Convergence of solutions of second-order gradient-like systems with analytic nonlinearities. J. Differ. Equ. 144(2), 313–320 (1998)

    MathSciNet  MATH  Google Scholar 

  38. Kurdyka, K.: On gradients of functions definable in o-minimal structures. Annales de l’institut Fourier (Grenoble) 48(3), 769–783 (1998)

    MathSciNet  MATH  Google Scholar 

  39. Lessard, L., Recht, B., Packard, A.: Analysis and design of optimization algorithms via integral quadratic constraints. SIAM J. Optim. 26, 57–95 (2016)

    MathSciNet  MATH  Google Scholar 

  40. Li, G., Pong, T.K.: Calculus of the exponent of Kurdyka–Łojasiewicz inequality and its applications to linear convergence of first-order methods. Found. Comput. Math. 18(5), 1–34 (2018)

  41. Łojasiewicz, S.: Une propriété topologique des sous-ensembles analytiques réels, Les Équations aux Dérivées Partielles, Éditions du Centre National de la Recherche Scientifique Paris, pp. 87–89 (1963)

  42. Lorenz, D.A., Pock, T.: An inertial forward–backward algorithm for monotone inclusions. J. Math. Imaging Vis. 51(2), 311–325 (2015)

    MathSciNet  MATH  Google Scholar 

  43. Nesterov, Y.: A method for solving the convex programming problem with convergence rate \(O(1/k^2)\). Dokl. Akad. Nauk SSSR 269(3), 543–547 (1983). (Russian)

    MathSciNet  Google Scholar 

  44. Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course. Kluwer, Dordrecht (2004)

    MATH  Google Scholar 

  45. Ochs, P.: Local convergence of the heavy-ball method and ipiano for non-convex optimization. J. Optim. Theory Appl. 177(1), 153–180 (2018)

    MathSciNet  MATH  Google Scholar 

  46. Ochs, P., Chen, Y., Brox, T., Pock, T.: iPiano: inertial proximal algorithm for non-convex optimization. SIAM J. Imaging Sci. 7(2), 1388–1419 (2014)

    MathSciNet  MATH  Google Scholar 

  47. Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. U.S.S.R. Comput. Math. Math. Phys. 4(5), 1–17 (1964)

    Google Scholar 

  48. Rockafellar, R.T., Wets, R.J.-B.: Variational Analysis, Fundamental Principles of Mathematical Sciences, vol. 317. Springe, Berlin (1998)

    Google Scholar 

  49. Simon, L.: Asymptotics for a class of nonlinear evolution equations, with applications to geometric problems. Ann. Math. 118(3), 525–571 (1983)

    MathSciNet  MATH  Google Scholar 

  50. Su, W., Boyd, S., Candès, E.J.: A differential equation for modeling Nesterov’s accelerated gradient method: theory and insights. J. Mach. Learn. Res. 17, 1–43 (2016)

    MathSciNet  MATH  Google Scholar 

  51. Sun, T., Yin, P., Li, D., Huang, C., Guan, L., Jiang, H.: Non-ergodic convergence analysis of heavy-ball algorithms. In: The Thirty-Third AAAI Conference on Artificial Intelligence (2019)

  52. Zavriev, S.K., Kostyuk, F.V.: Heavy-ball method in non-convex optimization problems. Comput. Math. Model. 4, 336–341 (1993)

    MATH  Google Scholar 

Download references

Acknowledgements

The author is thankful to two anonymous referees for their valuable remarks and suggestions which led to the improvement of the quality of the paper.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Szilárd Csaba László.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

This work was supported by a grant of Ministry of Research and Innovation, CNCS—UEFISCDI, Project Number PN-III-P1-1.1-TE-2016-0266, and by a grant of Ministry of Research and Innovation, CNCS—UEFISCDI, Project Number PN-III-P4-ID-PCE-2016-0190, within PNCDI III.

Appendix

Appendix

1.1 Second order continuous dynamical systems that are modelling Algorithm (2)

In what follows we emphasize the connections between Algorithm (2) and the continuous dynamical systems (3) and (4).

Consider (4) with the initial conditions \(x(t_0)=u_0,\,{\dot{x}}(t_0)=v_0,\,u_0,v_0\in {\mathbb {R}}^m\) and the governing second order differential equation

$$\begin{aligned} \ddot{x}(t)+\left( \gamma +\frac{\alpha }{t}\right) {\dot{x}}(t)+{\nabla }g(x(t))=0,\,\gamma >0,\,\alpha \in {\mathbb {R}}. \end{aligned}$$

We will use the time discretization presented in [6], that is, we take the fixed stepsize \(h> 0,\) and consider \(\beta =1-\gamma h>0\), \(t_n = \frac{1}{\beta } nh\) and \(x_n = x(t_n).\) Then the implicit/explicit discretization of (3) leads to

$$\begin{aligned} \frac{1}{h^2}(x_{n+1}-2x_n+x_{n-1})+\left( \frac{\gamma }{h}+\frac{\alpha \beta }{nh^2}\right) (x_n-x_{n-1})+{\nabla }g(y_n)=0, \end{aligned}$$
(55)

where \(y_n\) is a linear combination of \(x_n\) and \(x_{n-1}\) and will be defined below.

Now, (55) can be rewritten as

$$\begin{aligned} x_{n+1}= x_n+\left( \beta -\frac{\alpha \beta }{ n}\right) (x_n-x_{n-1})-h^2{\nabla }g(y_n), \end{aligned}$$

which suggest to choose \(y_n\) in the form

$$\begin{aligned} y_n=x_n+\left( \beta -\frac{\alpha \beta }{ n}\right) (x_n-x_{n-1}). \end{aligned}$$

However, for practical purposes, it is convenient to work with the re-indexation \(n\rightarrowtail n+\alpha \) and we obtain the following equivalent formulation

$$\begin{aligned} y_n=x_n+\frac{\beta n}{ n+\alpha }(x_n-x_{n-1}). \end{aligned}$$

Hence, by taking \(h^2=s\) we get

$$\begin{aligned} x_{n+1}= x_n+\frac{\beta n}{ n+\alpha }(x_n-x_{n-1})-s{\nabla }g(y_n), \end{aligned}$$

which is exactly Algorithm (2).

Remark 25

Obviously, already the form \(\beta =1-\gamma h>0\) shows that \(\beta \in (0,1).\) We could not obtain Algorithm (2) via some similar discretization of the continuous dynamical system (3) as the discretization method presented above. Nevertheless, we can show that (3) is the exact limit of Algorithm (2) in the sense of Su, Boyd and Candès [50].

In what follows we show that by choosing appropriate values of \(\beta \), both the continuous second order dynamical systems (3) and the continuous dynamical system (4) are the exact limit of the numerical scheme (2).

To this end we take in (2) small step sizes and follow the same approach as Su, Boyd and Candès in [50], (see also [27] for similar approaches). For this purpose we rewrite (2) in the form

$$\begin{aligned} \frac{x_{n+1}-x_n}{\sqrt{s}}=\frac{\beta n}{n+\alpha }\cdot \frac{x_n-x_{n-1}}{\sqrt{s}}-\sqrt{s}{\nabla }g(y_n) \ \forall n \ge 1 \end{aligned}$$
(56)

and introduce the Ansatz \(x_n\approx x(n\sqrt{s})\) for some twice continuously differentiable function \(x : [0,+\infty ) \rightarrow {\mathbb {R}}^m\). We let \(n=\frac{t}{\sqrt{s}}\) and get \(x(t)\approx x_n,\,x(t+\sqrt{s})\approx x_{n+1},\,x(t-\sqrt{s})\approx x_{n-1}.\) Then, as the step size s goes to zero, from the Taylor expansion of x we obtain

$$\begin{aligned} \frac{x_{n+1}-x_n}{\sqrt{s}}={\dot{x}}(t)+\frac{1}{2}\ddot{x}(t)\sqrt{s}+o(\sqrt{s}) \end{aligned}$$

and

$$\begin{aligned} \frac{x_n-x_{n-1}}{\sqrt{s}}={\dot{x}}(t)-\frac{1}{2}\ddot{x}(t)\sqrt{s}+o(\sqrt{s}). \end{aligned}$$

Further, since

$$\begin{aligned} \sqrt{s}\Vert {\nabla }g(y_n)-{\nabla }g(x_n)\Vert \le \sqrt{s} L_g\Vert y_n-x_n\Vert =\sqrt{s} L_g\left| \frac{\beta n}{n+\alpha }\right| \Vert x_n-x_{n-1}\Vert =o(\sqrt{s}), \end{aligned}$$

it follows \(\sqrt{s} {\nabla }g(y_n)=\sqrt{s}{\nabla }g(x_n)+ o(\sqrt{s})\). Consequently, (56) can be written as

$$\begin{aligned}&{\dot{x}}(t)+\frac{1}{2}\ddot{x}(t)\sqrt{s}+ o(\sqrt{s})\\&\quad = \frac{\beta t}{t+\alpha \sqrt{s}}\left( {\dot{x}}(t)-\frac{1}{2}\ddot{x}(t)\sqrt{s}+ o(\sqrt{s})\right) -\sqrt{s}{\nabla }g(x(t))+ o(\sqrt{s}) \end{aligned}$$

or, equivalently

$$\begin{aligned}&(t+\alpha \sqrt{s})\left( {\dot{x}}(t)+\frac{1}{2}\ddot{x}(t)\sqrt{s}+o(\sqrt{s})\right) \\&\quad = \beta t\left( {\dot{x}}(t)-\frac{1}{2}\ddot{x}(t)\sqrt{s}+o(\sqrt{s})\right) -\sqrt{s}(t+\alpha \sqrt{s}){\nabla }g(x(t))+o(\sqrt{s}). \end{aligned}$$

Hence,

$$\begin{aligned} \frac{1}{2}\left( \alpha \sqrt{s}+(1+\beta )t\right) \ddot{x}(t)\sqrt{s}+\left( (1-\beta )t+\alpha \sqrt{s}\right) {\dot{x}}(t)+\sqrt{s}(t+\alpha \sqrt{s}){\nabla }g(x(t))=o(\sqrt{s}).\nonumber \\ \end{aligned}$$
(57)

Now, if we take \(\beta =1-\gamma {s}<1\) in (57) for some \(\frac{1}{{s}}>\gamma >0\), we obtain

$$\begin{aligned} \frac{1}{2}\left( \alpha \sqrt{s}+(2-\gamma {s})t\right) \ddot{x}(t)\sqrt{s}+\left( \gamma {s}t+\alpha \sqrt{s}\right) {\dot{x}}(t)+\sqrt{s}(t+\alpha \sqrt{s}){\nabla }g(x(t))=o(\sqrt{s}). \end{aligned}$$

After dividing by \(\sqrt{s}\) and letting \(s\rightarrow 0\), we obtain

$$\begin{aligned} t\ddot{x}(t)+\alpha {\dot{x}}(t)+t{\nabla }g(x(t))=0, \end{aligned}$$

which after division by t gives (3), that is,

$$\begin{aligned} \ddot{x}(t)+\frac{\alpha }{t}{\dot{x}}(t)+{\nabla }g(x(t))=0. \end{aligned}$$

Similarly, by taking \(\beta =1-\gamma \sqrt{s}<1\) in (57), for some \(\frac{1}{\sqrt{s}}>\gamma >0\), we obtain

$$\begin{aligned} \frac{1}{2}\left( \alpha \sqrt{s}+(2-\gamma \sqrt{s})t\right) \ddot{x}(t)\sqrt{s}+\left( \gamma \sqrt{s}t+\alpha \sqrt{s}\right) {\dot{x}}(t)+\sqrt{s}(t+\alpha \sqrt{s}){\nabla }g(x(t))=o(\sqrt{s}). \end{aligned}$$

After dividing by \(\sqrt{s}\) and letting \(s\rightarrow 0\), we get

$$\begin{aligned} t\ddot{x}(t)+(\gamma t+\alpha ){\dot{x}}(t)+t{\nabla }g(x(t))=0, \end{aligned}$$

which after division by t gives (4), that is,

$$\begin{aligned} \ddot{x}(t)+\left( \gamma +\frac{\alpha }{t}\right) {\dot{x}}(t)+{\nabla }g(x(t))=0. \end{aligned}$$

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

László, S.C. Convergence rates for an inertial algorithm of gradient type associated to a smooth non-convex minimization. Math. Program. 190, 285–329 (2021). https://doi.org/10.1007/s10107-020-01534-w

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10107-020-01534-w

Keywords

Mathematics Subject Classification

Navigation