Skip to main content
Log in

First-order optimization algorithms via inertial systems with Hessian driven damping

  • Full Length Paper
  • Series A
  • Published:
Mathematical Programming Submit manuscript

Abstract

In a Hilbert space setting, for convex optimization, we analyze the convergence rate of a class of first-order algorithms involving inertial features. They can be interpreted as discrete time versions of inertial dynamics involving both viscous and Hessian-driven dampings. The geometrical damping driven by the Hessian intervenes in the dynamics in the form \(\nabla ^2 f (x(t)) \dot{x} (t)\). By treating this term as the time derivative of \( \nabla f (x (t)) \), this gives, in discretized form, first-order algorithms in time and space. In addition to the convergence properties attached to Nesterov-type accelerated gradient methods, the algorithms thus obtained are new and show a rapid convergence towards zero of the gradients. On the basis of a regularization technique using the Moreau envelope, we extend these methods to non-smooth convex functions with extended real values. The introduction of time scale factors makes it possible to further accelerate these algorithms. We also report numerical results on structured problems to support our theoretical findings.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3

Similar content being viewed by others

Notes

  1. One can even consider the more general case \(b(t)=1+b/(hk), b > 0\) for which our discussion remains true under minor modifications. But we do not pursue this for the sake of simplicity.

References

  1. Álvarez, F.: On the minimizing property of a second-order dissipative system in Hilbert spaces. SIAM J. Control Optim. 38(4), 1102–1119 (2000)

    Article  MathSciNet  Google Scholar 

  2. Álvarez, F., Attouch, H., Bolte, J., Redont, P.: A second-order gradient-like dissipative dynamical system with Hessian-driven damping. Application to optimization and mechanics. J. Math. Pures Appl. 81(8), 747–779 (2002)

    Article  MathSciNet  Google Scholar 

  3. Apidopoulos, V., Aujol, J.-F., Dossal, C.: Convergence rate of inertial Forward–Backward algorithm beyond Nesterov’s rule. Math. Program. Ser. B. 180, 137–156 (2020)

    Article  MathSciNet  Google Scholar 

  4. Attouch, H., Cabot, A.: Asymptotic stabilization of inertial gradient dynamics with time-dependent viscosity. J. Differ. Equ. 263, 5412–5458 (2017)

    Article  MathSciNet  Google Scholar 

  5. Attouch, H., Cabot, A.: Convergence rates of inertial forward–backward algorithms. SIAM J. Optim. 28(1), 849–874 (2018)

    Article  MathSciNet  Google Scholar 

  6. Attouch, H., Cabot, A., Chbani, Z., Riahi, H.: Rate of convergence of inertial gradient dynamics with time-dependent viscous damping coefficient. Evol. Equ. Control Theory 7(3), 353–371 (2018)

    Article  MathSciNet  Google Scholar 

  7. Attouch, H., Chbani, Z., Riahi, H.: Fast proximal methods via time scaling of damped inertial dynamics. SIAM J. Optim. 29(3), 2227–2256 (2019)

    Article  MathSciNet  Google Scholar 

  8. Attouch, H., Chbani, Z., Peypouquet, J., Redont, P.: Fast convergence of inertial dynamics and algorithms with asymptotic vanishing viscosity. Math. Program. Ser. B. 168, 123–175 (2018)

    Article  MathSciNet  Google Scholar 

  9. Attouch, H., Chbani, Z., Riahi, H.: Rate of convergence of the Nesterov accelerated gradient method in the subcritical case \(\alpha \le 3\). ESAIM Control Optim. Calc. Var. 25, 2–35 (2019)

    Article  MathSciNet  Google Scholar 

  10. Attouch, H., Peypouquet, J.: The rate of convergence of Nesterov’s accelerated forward–backward method is actually faster than \(1/k^2\). SIAM J. Optim. 26(3), 1824–1834 (2016)

    Article  MathSciNet  Google Scholar 

  11. Attouch, H., Peypouquet, J., Redont, P.: A dynamical approach to an inertial forward–backward algorithm for convex minimization. SIAM J. Optim. 24(1), 232–256 (2014)

    Article  MathSciNet  Google Scholar 

  12. Attouch, H., Peypouquet, J., Redont, P.: Fast convex minimization via inertial dynamics with Hessian driven damping. J. Diffe. Equ. 261(10), 5734–5783 (2016)

    Article  Google Scholar 

  13. Attouch, H., Svaiter, B. F.: A continuous dynamical Newton-Like approach to solving monotone inclusions. SIAM J. Control Optim. 49(2), 574–598 (2011). Global convergence of a closed-loop regularized Newton method for solving monotone inclusions in Hilbert spaces. J. Optim. Theory Appl. 157(3), 624–650 (2013)

  14. Aujol, J.-F., Dossal, Ch.: Stability of over-relaxations for the forward-backward algorithm, application to FISTA. SIAM J. Optim. 25(4), 2408–2433 (2015)

    Article  MathSciNet  Google Scholar 

  15. Aujol, J.-F., Dossal, C.: Optimal rate of convergence of an ODE associated to the Fast Gradient Descent schemes for \(b>0\) (2017). https://hal.inria.fr/hal-01547251v2

  16. Bateman, H.: Higher Transcendental Functions, vol. 1. McGraw-Hill, New York (1953)

    Google Scholar 

  17. Bauschke, H., Combettes, P.L.: Convex Analysis and Monotone Operator Theory in Hilbert Spaces. CMS Books in Mathematics, Springer (2011)

    Book  Google Scholar 

  18. Beck, A., Teboulle, M.: A fast iterative shrinkage-thresholding algorithm for linear inverse problems. SIAM J. Imaging Sci. 2(1), 183–202 (2009)

    Article  MathSciNet  Google Scholar 

  19. Brézis, H.: Opérateurs maximaux monotones dans les espaces de Hilbert et équations d’évolution, Lecture Notes 5, North Holland, (1972)

  20. Cabot, A., Engler, H., Gadat, S.: On the long time behavior of second order differential equations with asymptotically small dissipation. Trans. Am. Math. Soc. 361, 5983–6017 (2009)

    Article  MathSciNet  Google Scholar 

  21. Chambolle, A., Dossal, Ch.: On the convergence of the iterates of the fast iterative shrinkage thresholding algorithm. J. Optim. Theory Appl. 166, 968–982 (2015)

    Article  MathSciNet  Google Scholar 

  22. Chambolle, A., Pock, T.: An introduction to continuous optimization for imaging. Acta Numer. 25, 161–319 (2016)

    Article  MathSciNet  Google Scholar 

  23. Gelfand, I.M., Zejtlin, M.: Printszip nelokalnogo poiska v sistemah avtomatich, Optimizatsii, Dokl. AN SSSR, 137, 295?298 (1961) (in Russian)

  24. May, R.: Asymptotic for a second-order evolution equation with convex potential and vanishing damping term. Turk. J. Math. 41(3), 681–685 (2017)

    Article  MathSciNet  Google Scholar 

  25. Nesterov, Y.: A method of solving a convex programming problem with convergence rate \(O(1/k^2)\). Sov. Math. Doklady 27, 372–376 (1983)

    MATH  Google Scholar 

  26. Nesterov, Y.: Gradient methods for minimizing composite objective function. Math. Program. 152(1–2), 381–404 (2015)

    Article  MathSciNet  Google Scholar 

  27. Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. U.S.S.R. Comput. Math. Math. Phys. 4, 1–17 (1964)

    Article  Google Scholar 

  28. Polyak, B.T.: Introduction to Optimization. Optimization Software, New York (1987)

    MATH  Google Scholar 

  29. Siegel, W.: Accelerated first-order methods: differential equations and Lyapunov functions. arXiv:1903.05671v1 [math.OC] (2019)

  30. Shi, B., Du, S.S., Jordan, M.I., Su, W.J.: Understanding the acceleration phenomenon via high-resolution differential equations. arXiv:submit/2440124 [cs.LG] 21 Oct 2018

  31. Su, W.J., Boyd, S., Candès, E.J.: A differential equation for modeling Nesterov’s accelerated gradient method: theory and insights. NIPS’14 27, 2510–2518 (2014)

    MATH  Google Scholar 

  32. Wilson, A.C., Recht, B., Jordan, M.I.: A Lyapunov analysis of momentum methods in optimization. arXiv:1611.02635 (2016)

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jalal Fadili.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Auxiliary results

Auxiliary results

1.1 Extended descent lemma

Lemma 1

Let \(f: {{\mathcal {H}}}\rightarrow {\mathbb R}\) be a convex function whose gradient is L-Lipschitz continuous. Let \(s \in ]0,1/L]\). Then for all \((x,y) \in {{\mathcal {H}}}^2\), we have

$$\begin{aligned} f(y - s \nabla f (y)) \le f (x) + \left\langle \nabla f (y), y-x \right\rangle -\frac{s}{2} \Vert \nabla f (y) \Vert ^2 -\frac{s}{2} \Vert \nabla f (x)- \nabla f (y) \Vert ^2 .\nonumber \\ \end{aligned}$$
(28)

Proof

Denote \(y^+=y - s \nabla f (y)\). By the standard descent lemma applied to \(y^+\) and y, and since \(sL \le 1\) we have

$$\begin{aligned} f(y^+) \le f(y) - \frac{s}{2}\left( {2-Ls}\right) \Vert \nabla f (y) \Vert ^2 \le f(y) - \frac{s}{2} \Vert \nabla f (y) \Vert ^2. \end{aligned}$$
(29)

We now argue by duality between strong convexity and Lipschitz continuity of the gradient of a convex function. Indeed, using Fenchel identity, we have

$$\begin{aligned} f(y) = \langle \nabla f(y),\,y \rangle - f^*(\nabla f(y)) . \end{aligned}$$

L-Lipschitz continuity of the gradient of f is equivalent to 1/L-strong convexity of its conjugate \(f^*\). This together with the fact that \((\nabla f)^{-1}=\partial f^*\) gives for all \((x,y) \in {{\mathcal {H}}}^2\),

$$\begin{aligned} f^*(\nabla f(y)) \ge f^*(\nabla f(x)) + \langle x,\,\nabla f(y)-\nabla f(x) \rangle + \frac{1}{2L}\left\| {\nabla f(x)-\nabla f(y)}\right\| ^2 . \end{aligned}$$

Inserting this inequality into the Fenchel identity above yields

$$\begin{aligned} f(y)&\le - f^*(\nabla f(x)) + \langle \nabla f(y),\,y \rangle - \langle x,\,\nabla f(y)-\nabla f(x) \rangle - \frac{1}{2L}\left\| {\nabla f(x)-\nabla f(y)}\right\| ^2 \\&= - f^*(\nabla f(x)) + \langle x,\,\nabla f(x) \rangle + \langle \nabla f(y),\,y-x \rangle - \frac{1}{2L}\left\| {\nabla f(x)-\nabla f(y)}\right\| ^2 \\&= f(x) + \langle \nabla f(y),\,y-x \rangle - \frac{1}{2L}\left\| {\nabla f(x)-\nabla f(y)}\right\| ^2 \\&\le f(x) + \langle \nabla f(y),\,y-x \rangle - \frac{s}{2}\left\| {\nabla f(x)-\nabla f(y)}\right\| ^2 . \end{aligned}$$

Inserting the last bound into (29) completes the proof. \(\square \)

1.2 Proof of (27)

Proof

We have

$$\begin{aligned} {{\,\mathrm{prox}\,}}_{f}^{M}(x)&= {{\,\mathrm{argmin}\,}}_{z \in {\mathbb R}^n} \frac{1}{2}\left\| {z - x}\right\| _M^2 + f(z) \\&= {{\,\mathrm{argmin}\,}}_{z \in {\mathbb R}^n} \frac{1}{2s}\left\| {z - x}\right\| ^2 - \frac{1}{2}\left\| {A(z - x)}\right\| ^2 + \frac{1}{2}\left\| {y-A z}\right\| ^2 + g(z) . \end{aligned}$$

By the Pythagoras relation, we then get

$$\begin{aligned} {{\,\mathrm{prox}\,}}_{f}^M(x)&= {{\,\mathrm{argmin}\,}}_{z \in {\mathbb R}^n} \frac{1}{2s}\left\| {z - x}\right\| ^2 + \frac{1}{2}\left\| {y-A x}\right\| ^2 - \langle A(x-z),\,A x - y \rangle + g(z) \\&= {{\,\mathrm{argmin}\,}}_{z \in {\mathbb R}^n} \frac{1}{2s}\left\| {z - x}\right\| ^2 - \langle z - x,\,A^*\left( {y - A x}\right) \rangle + g(z) \\&= {{\,\mathrm{argmin}\,}}_{z \in {\mathbb R}^n} \frac{1}{2s}\left\| {z - \left( {x - s A^*\left( {A x - y}\right) }\right) }\right\| ^2 + g(z) \\&= {{\,\mathrm{prox}\,}}_{s g}\left( {x - s A^*\left( {A x - y}\right) }\right) . \end{aligned}$$

\(\square \)

1.3 Closed-form solutions of \(\text {(DIN-AVD)}_{\alpha ,\beta ,b}\,\) for quadratic functions

We here provide the closed form solutions to \(\text {(DIN-AVD)}_{\alpha ,\beta ,b}\,\) for the quadratic objective \(f: {\mathbb R}^n \rightarrow \langle Ax,\,x \rangle \), where A is a symmetric positive definite matrix. The case of a semidefinite positive matrix A can be treated similarly by restricting the analysis to \(\ker (A)^\top \). Projecting \(\text {(DIN-AVD)}_{\alpha ,\beta ,b}\,\) on the eigenspace of A, one has to solve n independent one-dimensional ODEs of the form

$$\begin{aligned} \ddot{x}_i(t) + \left( {\frac{\alpha }{t}+\beta (t)\lambda _i}\right) \dot{x}_i(t) + \lambda _i b(t) x_i(t) = 0, \qquad i=1,\ldots ,n . \end{aligned}$$

where \(\lambda _i > 0\) is an eigenvalue of A. In the following, we drop the subscript i.

Case \(\varvec{\beta (t) \equiv \beta , b(t)=b+\gamma /t, \beta \ge 0, b > 0, \gamma \ge 0}\): The ODE reads

$$\begin{aligned} \ddot{x}(t) + \left( {\frac{\alpha }{t}+\beta \lambda }\right) \dot{x}(t) + \lambda \left( {b+\frac{\gamma }{t}}\right) x(t) = 0 . \end{aligned}$$
(30)
  • If \(\beta ^2\lambda ^2 \ne 4b\lambda \): set

    $$\begin{aligned} \xi = \sqrt{\beta ^2\lambda ^2 - 4b\lambda }, \, \kappa = \lambda \frac{\gamma -\alpha \beta /2}{\xi }, \, \sigma = (\alpha -1)/2 . \end{aligned}$$

    Using the relationship between the Whitaker functions and the Kummer’s confluent hypergeometric functions M and U, see [16], the solution to (30) can be shown to take the form

    $$\begin{aligned} x(t) = \xi ^{\alpha /2} e^{-(\beta \lambda +\xi )t/2}\left[ {c_1 M(\alpha /2-\kappa ,\alpha ,\xi t) + c_2 U(\alpha /2-\kappa ,\alpha ,\xi t)}\right] , \end{aligned}$$

    where \(c_1\) and \(c_2\) are constants given by the initial conditions.

  • If \(\beta ^2\lambda ^2 = 4b\lambda \): set \(\zeta =2\sqrt{\lambda \left( {\gamma -\alpha \beta /2}\right) }\). The solution to (30) takes the form

    $$\begin{aligned} x(t) = t^{-\left( {\alpha -1}\right) /2}e^{-\beta \lambda t/2}\left[ {c_1 J_{(\alpha -1)/2}(\zeta \sqrt{t}) + c_2 Y_{(\alpha -1)/2}(\zeta \sqrt{t})}\right] , \end{aligned}$$

    where \(J_\nu \) and \(Y_\nu \) are the Bessel functions of the first and second kind.

When \(\beta > 0\), one can clearly see the exponential decrease forced by the Hessian. From the asymptotic expansions of M, U, \(J_{\nu }\) and \(Y_{\nu }\) for large t, straightforward computations provide the behaviour of |x(t)| for large t as follows:

  • If \(\beta ^2\lambda ^2 > 4b\lambda \), we have

    $$\begin{aligned} |x(t)| = {{\mathcal {O}}}\left( {t^{-\frac{\alpha }{2}+|\kappa |} e^{-\frac{\beta \lambda -\xi }{2}t}}\right) = {{\mathcal {O}}}\left( {e^{-\frac{2b}{\beta }t - \left( {\frac{\alpha }{2}-|\kappa |}\right) \log (t)}}\right) . \end{aligned}$$
  • If \(\beta ^2\lambda ^2 < 4b\lambda \), whence \(\xi \in i {\mathbb R}^+_*\) and \(\kappa \in i {\mathbb R}\), we have

    $$\begin{aligned} |x(t)| = {{\mathcal {O}}}\left( {t^{-\frac{\alpha }{2}} e^{-\frac{\beta \lambda }{2}t}}\right) . \end{aligned}$$
  • If \(\beta ^2\lambda ^2 = 4b\lambda \), we have

    $$\begin{aligned} |x(t)| = {{\mathcal {O}}}\left( {t^{-\frac{2\alpha -1}{4}} e^{-\frac{\beta \lambda }{2}t}}\right) . \end{aligned}$$

Case \(\varvec{\beta (t) = t^{\beta }, b(t)=ct^{\beta -1}, \beta \ge 0, c > 0}\): The ODE reads now

$$\begin{aligned} \ddot{x}(t) + \left( {\frac{\alpha }{t}+t^\beta \lambda }\right) \dot{x}(t) + c\lambda t^{\beta -1} x(t) = 0 . \end{aligned}$$

Let us make the change of variable \(t :=\tau ^{\frac{1}{\beta +1}}\). Let \(y(\tau ) :=x\left( {\tau ^{\frac{1}{\beta +1}}}\right) \). By the standard derivation chain rule, it is straightforward to show that y obeys the ODE

$$\begin{aligned} \ddot{y}(\tau ) + \left( {\frac{\alpha +\beta }{(1+\beta )\tau }+\frac{\lambda }{1+\beta }}\right) \dot{y}(\tau ) + \frac{c\lambda }{(1+\beta )^2\tau } y(\tau ) = 0 . \end{aligned}$$

It is clear that this is a special case of (30). Since \(\beta \) and \(\lambda > 0\), set

$$\begin{aligned} \xi = \frac{\lambda }{1+\beta }, \, \kappa = -\frac{\alpha +\beta -c}{1+\beta }, \, \sigma = \frac{\alpha +\beta }{2(1+\beta )} - \frac{1}{2} . \end{aligned}$$

It follows from the first case above that

$$\begin{aligned} x(t) = \xi ^{\sigma +1/2} e^{-\frac{\lambda \tau }{1+\beta }}\left[ {c_1 M\left( {\sigma -\kappa +1/2,\frac{\alpha +\beta }{1+\beta },\xi \tau }\right) + c_2 U\left( {\sigma -\kappa +1/2,\frac{\alpha +\beta }{1+\beta },\xi \tau }\right) }\right] . \end{aligned}$$

Asymptotic estimates can also be derived similarly to above. We omit the details for the sake of brevity.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Attouch, H., Chbani, Z., Fadili, J. et al. First-order optimization algorithms via inertial systems with Hessian driven damping. Math. Program. 193, 113–155 (2022). https://doi.org/10.1007/s10107-020-01591-1

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10107-020-01591-1

Keywords

Mathematics Subject Classification

Navigation