First-order optimization algorithms via inertial systems with Hessian driven damping

Attouch, Hedy; Chbani, Zaki; Fadili, Jalal; Riahi, Hassan

doi:10.1007/s10107-020-01591-1

First-order optimization algorithms via inertial systems with Hessian driven damping

Full Length Paper
Series A
Published: 16 November 2020

Volume 193, pages 113–155, (2022)
Cite this article

Mathematical Programming Submit manuscript

Hedy Attouch¹,
Zaki Chbani²,
Jalal Fadili³ &
…
Hassan Riahi²

1645 Accesses
55 Citations
1 Altmetric
Explore all metrics

Abstract

In a Hilbert space setting, for convex optimization, we analyze the convergence rate of a class of first-order algorithms involving inertial features. They can be interpreted as discrete time versions of inertial dynamics involving both viscous and Hessian-driven dampings. The geometrical damping driven by the Hessian intervenes in the dynamics in the form $\nabla ^2 f (x(t)) \dot{x} (t)$. By treating this term as the time derivative of $ \nabla f (x (t)) $, this gives, in discretized form, first-order algorithms in time and space. In addition to the convergence properties attached to Nesterov-type accelerated gradient methods, the algorithms thus obtained are new and show a rapid convergence towards zero of the gradients. On the basis of a regularization technique using the Moreau envelope, we extend these methods to non-smooth convex functions with extended real values. The introduction of time scale factors makes it possible to further accelerate these algorithms. We also report numerical results on structured problems to support our theoretical findings.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Accelerated Gradient Methods Combining Tikhonov Regularization with Geometric Damping Driven by the Hessian

Article 31 May 2023

Newton-Type Inertial Algorithms for Solving Monotone Equations Governed by Sums of Potential and Nonpotential Operators

Article 10 May 2022

A fast continuous time approach with time scaling for nonsmooth convex optimization

Article Open access 16 December 2022

Notes

One can even consider the more general case $b(t)=1+b/(hk), b > 0$ for which our discussion remains true under minor modifications. But we do not pursue this for the sake of simplicity.

References

Álvarez, F.: On the minimizing property of a second-order dissipative system in Hilbert spaces. SIAM J. Control Optim. 38(4), 1102–1119 (2000)
Article MathSciNet Google Scholar
Álvarez, F., Attouch, H., Bolte, J., Redont, P.: A second-order gradient-like dissipative dynamical system with Hessian-driven damping. Application to optimization and mechanics. J. Math. Pures Appl. 81(8), 747–779 (2002)
Article MathSciNet Google Scholar
Apidopoulos, V., Aujol, J.-F., Dossal, C.: Convergence rate of inertial Forward–Backward algorithm beyond Nesterov’s rule. Math. Program. Ser. B. 180, 137–156 (2020)
Article MathSciNet Google Scholar
Attouch, H., Cabot, A.: Asymptotic stabilization of inertial gradient dynamics with time-dependent viscosity. J. Differ. Equ. 263, 5412–5458 (2017)
Article MathSciNet Google Scholar
Attouch, H., Cabot, A.: Convergence rates of inertial forward–backward algorithms. SIAM J. Optim. 28(1), 849–874 (2018)
Article MathSciNet Google Scholar
Attouch, H., Cabot, A., Chbani, Z., Riahi, H.: Rate of convergence of inertial gradient dynamics with time-dependent viscous damping coefficient. Evol. Equ. Control Theory 7(3), 353–371 (2018)
Article MathSciNet Google Scholar
Attouch, H., Chbani, Z., Riahi, H.: Fast proximal methods via time scaling of damped inertial dynamics. SIAM J. Optim. 29(3), 2227–2256 (2019)
Article MathSciNet Google Scholar
Attouch, H., Chbani, Z., Peypouquet, J., Redont, P.: Fast convergence of inertial dynamics and algorithms with asymptotic vanishing viscosity. Math. Program. Ser. B. 168, 123–175 (2018)
Article MathSciNet Google Scholar
Attouch, H., Chbani, Z., Riahi, H.: Rate of convergence of the Nesterov accelerated gradient method in the subcritical case $\alpha \le 3$. ESAIM Control Optim. Calc. Var. 25, 2–35 (2019)
Article MathSciNet Google Scholar
Attouch, H., Peypouquet, J.: The rate of convergence of Nesterov’s accelerated forward–backward method is actually faster than $1/k^2$. SIAM J. Optim. 26(3), 1824–1834 (2016)
Article MathSciNet Google Scholar
Attouch, H., Peypouquet, J., Redont, P.: A dynamical approach to an inertial forward–backward algorithm for convex minimization. SIAM J. Optim. 24(1), 232–256 (2014)
Article MathSciNet Google Scholar
Attouch, H., Peypouquet, J., Redont, P.: Fast convex minimization via inertial dynamics with Hessian driven damping. J. Diffe. Equ. 261(10), 5734–5783 (2016)
Article Google Scholar
Attouch, H., Svaiter, B. F.: A continuous dynamical Newton-Like approach to solving monotone inclusions. SIAM J. Control Optim. 49(2), 574–598 (2011). Global convergence of a closed-loop regularized Newton method for solving monotone inclusions in Hilbert spaces. J. Optim. Theory Appl. 157(3), 624–650 (2013)
Aujol, J.-F., Dossal, Ch.: Stability of over-relaxations for the forward-backward algorithm, application to FISTA. SIAM J. Optim. 25(4), 2408–2433 (2015)
Article MathSciNet Google Scholar
Aujol, J.-F., Dossal, C.: Optimal rate of convergence of an ODE associated to the Fast Gradient Descent schemes for $b>0$ (2017). https://hal.inria.fr/hal-01547251v2
Bateman, H.: Higher Transcendental Functions, vol. 1. McGraw-Hill, New York (1953)
Google Scholar
Bauschke, H., Combettes, P.L.: Convex Analysis and Monotone Operator Theory in Hilbert Spaces. CMS Books in Mathematics, Springer (2011)
Book Google Scholar
Beck, A., Teboulle, M.: A fast iterative shrinkage-thresholding algorithm for linear inverse problems. SIAM J. Imaging Sci. 2(1), 183–202 (2009)
Article MathSciNet Google Scholar
Brézis, H.: Opérateurs maximaux monotones dans les espaces de Hilbert et équations d’évolution, Lecture Notes 5, North Holland, (1972)
Cabot, A., Engler, H., Gadat, S.: On the long time behavior of second order differential equations with asymptotically small dissipation. Trans. Am. Math. Soc. 361, 5983–6017 (2009)
Article MathSciNet Google Scholar
Chambolle, A., Dossal, Ch.: On the convergence of the iterates of the fast iterative shrinkage thresholding algorithm. J. Optim. Theory Appl. 166, 968–982 (2015)
Article MathSciNet Google Scholar
Chambolle, A., Pock, T.: An introduction to continuous optimization for imaging. Acta Numer. 25, 161–319 (2016)
Article MathSciNet Google Scholar
Gelfand, I.M., Zejtlin, M.: Printszip nelokalnogo poiska v sistemah avtomatich, Optimizatsii, Dokl. AN SSSR, 137, 295?298 (1961) (in Russian)
May, R.: Asymptotic for a second-order evolution equation with convex potential and vanishing damping term. Turk. J. Math. 41(3), 681–685 (2017)
Article MathSciNet Google Scholar
Nesterov, Y.: A method of solving a convex programming problem with convergence rate $O(1/k^2)$. Sov. Math. Doklady 27, 372–376 (1983)
MATH Google Scholar
Nesterov, Y.: Gradient methods for minimizing composite objective function. Math. Program. 152(1–2), 381–404 (2015)
Article MathSciNet Google Scholar
Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. U.S.S.R. Comput. Math. Math. Phys. 4, 1–17 (1964)
Article Google Scholar
Polyak, B.T.: Introduction to Optimization. Optimization Software, New York (1987)
MATH Google Scholar
Siegel, W.: Accelerated first-order methods: differential equations and Lyapunov functions. arXiv:1903.05671v1 [math.OC] (2019)
Shi, B., Du, S.S., Jordan, M.I., Su, W.J.: Understanding the acceleration phenomenon via high-resolution differential equations. arXiv:submit/2440124 [cs.LG] 21 Oct 2018
Su, W.J., Boyd, S., Candès, E.J.: A differential equation for modeling Nesterov’s accelerated gradient method: theory and insights. NIPS’14 27, 2510–2518 (2014)
MATH Google Scholar
Wilson, A.C., Recht, B., Jordan, M.I.: A Lyapunov analysis of momentum methods in optimization. arXiv:1611.02635 (2016)

Download references

Author information

Authors and Affiliations

IMAG, Univ. Montpellier, CNRS, Montpellier, France
Hedy Attouch
Faculty of Sciences Semlalia, Mathematics, Cadi Ayyad University, 40000, Marrakech, Morocco
Zaki Chbani & Hassan Riahi
Normandie Université-ENSICAEN, CNRS, GREYC, Caen, France
Jalal Fadili

Authors

Hedy Attouch
View author publications
You can also search for this author in PubMed Google Scholar
Zaki Chbani
View author publications
You can also search for this author in PubMed Google Scholar
Jalal Fadili
View author publications
You can also search for this author in PubMed Google Scholar
Hassan Riahi
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jalal Fadili.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Auxiliary results

1.1 Extended descent lemma

Lemma 1

Let $f: {{\mathcal {H}}}\rightarrow {\mathbb R}$ be a convex function whose gradient is L-Lipschitz continuous. Let $s \in ]0,1/L]$. Then for all $(x,y) \in {{\mathcal {H}}}^2$, we have

$$\begin{aligned} f(y - s \nabla f (y)) \le f (x) + \left\langle \nabla f (y), y-x \right\rangle -\frac{s}{2} \Vert \nabla f (y) \Vert ^2 -\frac{s}{2} \Vert \nabla f (x)- \nabla f (y) \Vert ^2 .\nonumber \\ \end{aligned}$$

(28)

Proof

Denote $y^+=y - s \nabla f (y)$. By the standard descent lemma applied to $y^+$ and y, and since $sL \le 1$ we have

$$\begin{aligned} f(y^+) \le f(y) - \frac{s}{2}\left( {2-Ls}\right) \Vert \nabla f (y) \Vert ^2 \le f(y) - \frac{s}{2} \Vert \nabla f (y) \Vert ^2. \end{aligned}$$

(29)

We now argue by duality between strong convexity and Lipschitz continuity of the gradient of a convex function. Indeed, using Fenchel identity, we have

$$\begin{aligned} f(y) = \langle \nabla f(y),\,y \rangle - f^*(\nabla f(y)) . \end{aligned}$$

L-Lipschitz continuity of the gradient of f is equivalent to 1/L-strong convexity of its conjugate $f^*$. This together with the fact that $(\nabla f)^{-1}=\partial f^*$ gives for all $(x,y) \in {{\mathcal {H}}}^2$,

$$\begin{aligned} f^*(\nabla f(y)) \ge f^*(\nabla f(x)) + \langle x,\,\nabla f(y)-\nabla f(x) \rangle + \frac{1}{2L}\left\| {\nabla f(x)-\nabla f(y)}\right\| ^2 . \end{aligned}$$

Inserting this inequality into the Fenchel identity above yields

$$\begin{aligned} f(y)&\le - f^*(\nabla f(x)) + \langle \nabla f(y),\,y \rangle - \langle x,\,\nabla f(y)-\nabla f(x) \rangle - \frac{1}{2L}\left\| {\nabla f(x)-\nabla f(y)}\right\| ^2 \\&= - f^*(\nabla f(x)) + \langle x,\,\nabla f(x) \rangle + \langle \nabla f(y),\,y-x \rangle - \frac{1}{2L}\left\| {\nabla f(x)-\nabla f(y)}\right\| ^2 \\&= f(x) + \langle \nabla f(y),\,y-x \rangle - \frac{1}{2L}\left\| {\nabla f(x)-\nabla f(y)}\right\| ^2 \\&\le f(x) + \langle \nabla f(y),\,y-x \rangle - \frac{s}{2}\left\| {\nabla f(x)-\nabla f(y)}\right\| ^2 . \end{aligned}$$

Inserting the last bound into (29) completes the proof. $\square $

1.2 Proof of (27)

Proof

We have

$$\begin{aligned} {{\,\mathrm{prox}\,}}_{f}^{M}(x)&= {{\,\mathrm{argmin}\,}}_{z \in {\mathbb R}^n} \frac{1}{2}\left\| {z - x}\right\| _M^2 + f(z) \\&= {{\,\mathrm{argmin}\,}}_{z \in {\mathbb R}^n} \frac{1}{2s}\left\| {z - x}\right\| ^2 - \frac{1}{2}\left\| {A(z - x)}\right\| ^2 + \frac{1}{2}\left\| {y-A z}\right\| ^2 + g(z) . \end{aligned}$$

By the Pythagoras relation, we then get

$$\begin{aligned} {{\,\mathrm{prox}\,}}_{f}^M(x)&= {{\,\mathrm{argmin}\,}}_{z \in {\mathbb R}^n} \frac{1}{2s}\left\| {z - x}\right\| ^2 + \frac{1}{2}\left\| {y-A x}\right\| ^2 - \langle A(x-z),\,A x - y \rangle + g(z) \\&= {{\,\mathrm{argmin}\,}}_{z \in {\mathbb R}^n} \frac{1}{2s}\left\| {z - x}\right\| ^2 - \langle z - x,\,A^*\left( {y - A x}\right) \rangle + g(z) \\&= {{\,\mathrm{argmin}\,}}_{z \in {\mathbb R}^n} \frac{1}{2s}\left\| {z - \left( {x - s A^*\left( {A x - y}\right) }\right) }\right\| ^2 + g(z) \\&= {{\,\mathrm{prox}\,}}_{s g}\left( {x - s A^*\left( {A x - y}\right) }\right) . \end{aligned}$$

$\square $

1.3 Closed-form solutions of $\text {(DIN-AVD)}_{\alpha ,\beta ,b}\,$ for quadratic functions

We here provide the closed form solutions to $\text {(DIN-AVD)}_{\alpha ,\beta ,b}\,$ for the quadratic objective $f: {\mathbb R}^n \rightarrow \langle Ax,\,x \rangle $, where A is a symmetric positive definite matrix. The case of a semidefinite positive matrix A can be treated similarly by restricting the analysis to $\ker (A)^\top $. Projecting $\text {(DIN-AVD)}_{\alpha ,\beta ,b}\,$ on the eigenspace of A, one has to solve n independent one-dimensional ODEs of the form

$$\begin{aligned} \ddot{x}_i(t) + \left( {\frac{\alpha }{t}+\beta (t)\lambda _i}\right) \dot{x}_i(t) + \lambda _i b(t) x_i(t) = 0, \qquad i=1,\ldots ,n . \end{aligned}$$

where $\lambda _i > 0$ is an eigenvalue of A. In the following, we drop the subscript i.

Case $\varvec{\beta (t) \equiv \beta , b(t)=b+\gamma /t, \beta \ge 0, b > 0, \gamma \ge 0}$: The ODE reads

$$\begin{aligned} \ddot{x}(t) + \left( {\frac{\alpha }{t}+\beta \lambda }\right) \dot{x}(t) + \lambda \left( {b+\frac{\gamma }{t}}\right) x(t) = 0 . \end{aligned}$$

(30)

If $\beta ^2\lambda ^2 \ne 4b\lambda $: set
$$\begin{aligned} \xi = \sqrt{\beta ^2\lambda ^2 - 4b\lambda }, \, \kappa = \lambda \frac{\gamma -\alpha \beta /2}{\xi }, \, \sigma = (\alpha -1)/2 . \end{aligned}$$
Using the relationship between the Whitaker functions and the Kummer’s confluent hypergeometric functions M and U, see [16], the solution to (30) can be shown to take the form
$$\begin{aligned} x(t) = \xi ^{\alpha /2} e^{-(\beta \lambda +\xi )t/2}\left[ {c_1 M(\alpha /2-\kappa ,\alpha ,\xi t) + c_2 U(\alpha /2-\kappa ,\alpha ,\xi t)}\right] , \end{aligned}$$
where $c_1$ and $c_2$ are constants given by the initial conditions.
If $\beta ^2\lambda ^2 = 4b\lambda $: set $\zeta =2\sqrt{\lambda \left( {\gamma -\alpha \beta /2}\right) }$. The solution to (30) takes the form
$$\begin{aligned} x(t) = t^{-\left( {\alpha -1}\right) /2}e^{-\beta \lambda t/2}\left[ {c_1 J_{(\alpha -1)/2}(\zeta \sqrt{t}) + c_2 Y_{(\alpha -1)/2}(\zeta \sqrt{t})}\right] , \end{aligned}$$
where $J_\nu $ and $Y_\nu $ are the Bessel functions of the first and second kind.

When $\beta > 0$, one can clearly see the exponential decrease forced by the Hessian. From the asymptotic expansions of M, U, $J_{\nu }$ and $Y_{\nu }$ for large t, straightforward computations provide the behaviour of |x(t)| for large t as follows:

If $\beta ^2\lambda ^2 > 4b\lambda $, we have
$$\begin{aligned} |x(t)| = {{\mathcal {O}}}\left( {t^{-\frac{\alpha }{2}+|\kappa |} e^{-\frac{\beta \lambda -\xi }{2}t}}\right) = {{\mathcal {O}}}\left( {e^{-\frac{2b}{\beta }t - \left( {\frac{\alpha }{2}-|\kappa |}\right) \log (t)}}\right) . \end{aligned}$$
If $\beta ^2\lambda ^2 < 4b\lambda $, whence $\xi \in i {\mathbb R}^+_*$ and $\kappa \in i {\mathbb R}$, we have
$$\begin{aligned} |x(t)| = {{\mathcal {O}}}\left( {t^{-\frac{\alpha }{2}} e^{-\frac{\beta \lambda }{2}t}}\right) . \end{aligned}$$
If $\beta ^2\lambda ^2 = 4b\lambda $, we have
$$\begin{aligned} |x(t)| = {{\mathcal {O}}}\left( {t^{-\frac{2\alpha -1}{4}} e^{-\frac{\beta \lambda }{2}t}}\right) . \end{aligned}$$

Case $\varvec{\beta (t) = t^{\beta }, b(t)=ct^{\beta -1}, \beta \ge 0, c > 0}$: The ODE reads now

$$\begin{aligned} \ddot{x}(t) + \left( {\frac{\alpha }{t}+t^\beta \lambda }\right) \dot{x}(t) + c\lambda t^{\beta -1} x(t) = 0 . \end{aligned}$$

Let us make the change of variable $t :=\tau ^{\frac{1}{\beta +1}}$. Let $y(\tau ) :=x\left( {\tau ^{\frac{1}{\beta +1}}}\right) $. By the standard derivation chain rule, it is straightforward to show that y obeys the ODE

$$\begin{aligned} \ddot{y}(\tau ) + \left( {\frac{\alpha +\beta }{(1+\beta )\tau }+\frac{\lambda }{1+\beta }}\right) \dot{y}(\tau ) + \frac{c\lambda }{(1+\beta )^2\tau } y(\tau ) = 0 . \end{aligned}$$

It is clear that this is a special case of (30). Since $\beta $ and $\lambda > 0$, set

$$\begin{aligned} \xi = \frac{\lambda }{1+\beta }, \, \kappa = -\frac{\alpha +\beta -c}{1+\beta }, \, \sigma = \frac{\alpha +\beta }{2(1+\beta )} - \frac{1}{2} . \end{aligned}$$

It follows from the first case above that

$$\begin{aligned} x(t) = \xi ^{\sigma +1/2} e^{-\frac{\lambda \tau }{1+\beta }}\left[ {c_1 M\left( {\sigma -\kappa +1/2,\frac{\alpha +\beta }{1+\beta },\xi \tau }\right) + c_2 U\left( {\sigma -\kappa +1/2,\frac{\alpha +\beta }{1+\beta },\xi \tau }\right) }\right] . \end{aligned}$$

Asymptotic estimates can also be derived similarly to above. We omit the details for the sake of brevity.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Attouch, H., Chbani, Z., Fadili, J. et al. First-order optimization algorithms via inertial systems with Hessian driven damping. Math. Program. 193, 113–155 (2022). https://doi.org/10.1007/s10107-020-01591-1

Download citation

Received: 24 July 2019
Accepted: 03 November 2020
Published: 16 November 2020
Issue Date: May 2022
DOI: https://doi.org/10.1007/s10107-020-01591-1

Keywords

Mathematics Subject Classification

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

First-order optimization algorithms via inertial systems with Hessian driven damping

Abstract

Access this article

Similar content being viewed by others

Accelerated Gradient Methods Combining Tikhonov Regularization with Geometric Damping Driven by the Hessian

Newton-Type Inertial Algorithms for Solving Monotone Equations Governed by Sums of Potential and Nonpotential Operators

A fast continuous time approach with time scaling for nonsmooth convex optimization

Notes

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Auxiliary results

1.1 Extended descent lemma

Lemma 1

Proof

1.2 Proof of (27)

Proof

1.3 Closed-form solutions of \(\text {(DIN-AVD)}_{\alpha ,\beta ,b}\,\) for quadratic functions

Rights and permissions

About this article

Cite this article

Keywords

Mathematics Subject Classification

Navigation

First-order optimization algorithms via inertial systems with Hessian driven damping

Abstract

Access this article

Similar content being viewed by others

Accelerated Gradient Methods Combining Tikhonov Regularization with Geometric Damping Driven by the Hessian

Newton-Type Inertial Algorithms for Solving Monotone Equations Governed by Sums of Potential and Nonpotential Operators

A fast continuous time approach with time scaling for nonsmooth convex optimization

Notes

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Auxiliary results

Auxiliary results

1.1 Extended descent lemma

Lemma 1

Proof

1.2 Proof of (27)

Proof

1.3 Closed-form solutions of \(\text {(DIN-AVD)}_{\alpha ,\beta ,b}\,\) for quadratic functions

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Mathematics Subject Classification

Search

Navigation