Fast gradient methods for uniformly convex and weakly smooth problems

Park, Jongho

doi:10.1007/s10444-022-09943-5

Fast gradient methods for uniformly convex and weakly smooth problems

Published: 24 May 2022

Volume 48, article number 34, (2022)
Cite this article

Advances in Computational Mathematics Aims and scope Submit manuscript

Jongho Park ORCID: orcid.org/0000-0002-0496-649X¹

112 Accesses
1 Citation
1 Altmetric
Explore all metrics

Abstract

In this paper, acceleration of gradient methods for convex optimization problems with weak levels of convexity and smoothness is considered. Starting from the universal fast gradient method which was designed to be an optimal method for weakly smooth problems whose gradients are Hölder continuous, its momentum is modified appropriately so that it can also accommodate uniformly convex and weakly smooth problems. Different from the existing works, fast gradient methods proposed in this paper do not use the restarting technique but use momentums that are suitably designed to reflect both the uniform convexity and weak smoothness information of the target energy function. Both theoretical and numerical results that support the superiority of the proposed methods are presented.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Tseng’s extragradient method with double projection for solving pseudomonotone variational inequality problems in Hilbert spaces

Article 10 April 2024

The Deep Ritz Method: A Deep Learning-Based Numerical Algorithm for Solving Variational Problems

Article 14 February 2018

The Frank-Wolfe Algorithm: A Short Introduction

Article Open access 13 December 2023

Availability of data and materials

All data generated or analyzed during this study are included in this published article.

Code availability

The source code used to generate all data for the current study is available from the corresponding author upon request.

References

Bauschke, H.H., Combettes, P.L.: Convex analysis and monotone operator theory in Hilbert spaces. Springer, New York (2011)
Nemirovskii, A.S., Nesterov, Y.E.: Optimal methods of smooth convex minimization. USSR Comput. Math. Math. Phys. 25(2), 21–30 (1985)
Article MathSciNet Google Scholar
Park, J.: Additive Schwarz methods for convex optimization as gradient methods. SIAM J. Numer. Anal. 58(3), 1495–1530 (2020)
Article MathSciNet Google Scholar
Roulet, V., d’Aspremont, A.: Sharpness, restart, and acceleration. SIAM J. Optim. 30(1), 262–289 (2020)
Article MathSciNet Google Scholar
Bolte, J., Daniilidis, A., Lewis, A.: The Łojasiewicz inequality for nonsmooth subanalytic functions with applications to subgradient dynamical systems. SIAM J. Optim. 17(4), 1205–1223 (2007)
Article Google Scholar
Xu, Y., Yin, W.: A block coordinate descent method for regularized multiconvex optimization with applications to nonnegative tensor factorization and completion. SIAM J. Imaging Sci. 6(3), 1758–1789 (2013)
Article MathSciNet Google Scholar
Nesterov, Y.: Universal gradient methods for convex optimization problems. Math. Program. 152(1-2), 381–404 (2015)
Article MathSciNet Google Scholar
Chambolle, A., Pock, T.: An introduction to continuous optimization for imaging. Acta Numer. 25, 161–319 (2016)
Article MathSciNet Google Scholar
Nesterov, Y.: Lectures on convex optimization. Springer, Cham (2018)
Book Google Scholar
Azé, D., Penot, J-P: Uniformly convex and uniformly smooth convex functions. In: Annales de la Faculté des sciences de Toulouse: Mathématiques, vol. 4, pp 705–730 (1995)
Xu, Z-B, Roach, G.F.: Characteristic inequalities of uniformly convex and uniformly smooth Banach spaces. J. Math. Anal. Appl. 157(1), 189–210 (1991)
Article MathSciNet Google Scholar
Ciarlet, P.G.: The finite element method for elliptic problems. SIAM, Philadelphia (2002)
Book Google Scholar
Bermejo, R., Infante, J.-A.: A multigrid algorithm for the p-Laplacian. SIAM J. Sci. Comput. 21(5), 1774–1789 (2000)
Article MathSciNet Google Scholar
Feng, W., Salgado, A.J., Wang, C., Wise, S.M.: Preconditioned steepest descent methods for some nonlinear elliptic equations involving p-Laplacian terms. J. Comput. Phys. 334, 45–67 (2017)
Article MathSciNet Google Scholar
Huang, Y.Q., Li, R., Liu, W.: Preconditioned descent algorithms for p-Laplacian. J. Sci. Comput. 32(2), 343–371 (2007)
Article MathSciNet Google Scholar
Zhou, G., Feng, C.: The steepest descent algorithm without line search for p-Laplacian. Appl. Math. Comput. 224, 36–45 (2013)
MathSciNet MATH Google Scholar
Nesterov, Y.E.: A method for solving the convex programming problem with convergence rate O(1/k²). Dokl. Akad. Nauk SSSR 269, 543–547 (1983)
MathSciNet Google Scholar
Nesterov, Y.: Smooth minimization of non-smooth functions. Math. Program. 103(1), 127–152 (2005)
Article MathSciNet Google Scholar
Beck, A., Teboulle, M.: A fast iterative shrinkage-thresholding algorithm for linear inverse problems. SIAM J. Imaging Sci. 2(1), 183–202 (2009)
Article MathSciNet Google Scholar
Nesterov, Y.: Gradient methods for minimizing composite functions. Math. Program. 140(1), 125–161 (2013)
Article MathSciNet Google Scholar
Devolder, O., Glineur, F., Nesterov, Y.: First-order methods with inexact oracle: the strongly convex case. Technical report, CORE Discussion Paper (2013)
Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Math. Program. 146(1), 37–75 (2014)
Article MathSciNet Google Scholar
Iouditski, A., Nesterov, Y.: Primal-dual subgradient methods for minimizing uniformly convex functions. arXiv:1401.1792 (2014)
Drori, Y., Teboulle, M.: Performance of first-order methods for smooth convex minimization: a novel approach. Math. Program. 145(1), 451–482 (2014)
Article MathSciNet Google Scholar
Kim, D., Fessler, J.A.: Optimized first-order methods for smooth convex minimization. Math, Program. 159(1), 81–107 (2016)
Article MathSciNet Google Scholar
Kim, D., Fessler, J.A.: Generalizing the optimized gradient method for smooth convex minimization. SIAM J. Optim. 28(2), 1920–1950 (2018)
Article MathSciNet Google Scholar
Calatroni, L., Chambolle, A.: Backtracking strategies for accelerated descent methods with smooth composite objectives. SIAM J. Optim. 29(3), 1772–1798 (2019)
Article MathSciNet Google Scholar
ODonoghue, B., Candes, E.: Adaptive restart for accelerated gradient schemes. Found. Comput. Math. 15(3), 715–732 (2015)
Article MathSciNet Google Scholar
Renegar, J., Grimmer, B.: A simple nearly optimal restart scheme for speeding up first-order methods. Found. Comput. Math. https://doi.org/10.1007/s10208-021-09502-2 (2021)
Nesterov, Y., Gasnikov, A., Guminov, S., Dvurechensky, P.: Primal–dual accelerated gradient methods with small-dimensional relaxation oracle. Optim. Methods Softw. 1–38 (2020)
Stonyakin, F., Tyurin, A., Gasnikov, A., Dvurechensky, P., Agafonov, A., Dvinskikh, D., Alkousa, M., Pasechnyuk, D., Artamonov, S., Piskunova, V.: Inexact model: a framework for optimization and variational inequalities. Optim. Methods Softw. 1–47 (2021)
Fercoq, O., Qu, Z.: Adaptive restart of accelerated gradient methods under local quadratic growth condition. IMA J. Numer. Anal. 39(4), 2069–2095 (2019)
Article MathSciNet Google Scholar
Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific J. Math. 16(1), 1–3 (1966)
Article MathSciNet Google Scholar
Love, E.R.: Some logarithm inequalities. Math. Gaz. 64(427), 55–57 (1980)
Article Google Scholar
Park, J.: Pseudo-linear convergence of an additive Schwarz method for dual total variation minimization. Electron. Trans. Numer. Anal. 54, 176–197 (2021)
Article MathSciNet Google Scholar

Download references

Acknowledgements

This work was initially started with the help of Professor Donghwan Kim through meetings on acceleration schemes for first-order methods. The author would like to thank him for his insightful comments and assistance.

Funding

This work was supported by the Basic Science Research Program through the National Research Foundation of Korea (NRF) funded by the Ministry of Education (2019R1A6A1A10073887).

Author information

Authors and Affiliations

Natural Science Research Institute, KAIST, Daejeon, 34141, Korea
Jongho Park

Authors

Jongho Park
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jongho Park.

Ethics declarations

Competing interests

The author declares no competing interests.

Additional information

Communicated by: Stefan Volkwein

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

Appendix A. Recurrence inequalities

This appendix presents several useful recurrence inequalities used throughout this paper and their proofs. Motivated by [20, Lemma 8] and [7, Theorem 3], we state the following useful lemma.

Lemma A.1

Let {A_n}_n≥ 1 be an increasing sequence of positive real numbers that satisfies

$$ (A_{n+1} - A_{n})^{\gamma} \geq C A_{n}A_{n+1}^{\gamma - 1}, \quad n \geq1 $$

for some 1 ≤ γ ≤ 2 and C > 0. Then we have

$$ A_{n} \geq A_{1} \left( 1+ \frac{C^{1/\gamma}}{2} \right)^{\gamma(n-1)}, \quad n \geq 1. $$

Proof

Take any n ≥ 1. Since 1/2 ≤ 1/γ ≤ 1, the following inequality holds:

$$ A_{n+1} - A_{n} \leq (A_{n+1}^{1- 1/\gamma} + A_{n}^{1- 1/\gamma} ) (A_{n+1}^{1/\gamma} - A_{n}^{1/\gamma} ) . $$

It follows that

$$ C A_{n} A_{n+1}^{\gamma - 1} \leq (A_{n+1}^{1- 1/\gamma} + A_{n}^{1- 1/\gamma} )^{\gamma} (A_{n+1}^{1/\gamma} - A_{n}^{1/\gamma} )^{\gamma} \leq 2^{\gamma} A_{n+1}^{\gamma - 1} (A_{n+1}^{1/\gamma} - A_{n}^{1/\gamma} )^{\gamma}. $$

Now, we have

$$ A_{n+1}^{1/\gamma} - A_{n}^{1/\gamma} \geq \frac{C^{1/\gamma}}{2} A_{n}^{1/\gamma} \quad \Leftrightarrow \quad A_{n+1} \geq \left( 1 + \frac{C^{1/\gamma}}{2} \right)^{\gamma} A_{n}. $$

We get the desired result by applying the above inequality recursively. □

The next lemma is useful when we prove sublinear convergence rate of some fast gradient methods. We note that the proof of Lemma A.2 closely follows [15, Lemma 1].

Lemma A.2

Let {A_n}_n≥ 1 be an increasing sequence of positive real numbers that satisfies

$$ A_{n+1} - A_{n} \geq C A_{n}^{\gamma}, \quad n \geq 1 $$

for some 0 ≤ γ < 1 and C > 0. Then we have

$$ A_{n} \geq \min \left\{ A_{1}, \left( \frac{C}{2 (2^{\frac{1}{1-\gamma}} - 1 )} \right)^{\frac{1}{1-\gamma}} \right\} n^{\frac{1}{1-\gamma}}, \quad n \geq 1. $$

Proof

Take any n ≥ 1. Since A_n+ 1 ≥ A_n, we get

$$ A_{n+1} - A_{n} \geq CA_{n}^{\gamma} \geq C A_{n} A_{n+1}^{-(1-\gamma)}, $$

or equivalently,

$$ \frac{A_{n+1}}{A_{n}} \geq 1 + C A_{n+1}^{-(1-\gamma)}. $$

Writing $A_{n} = B_{n} n^{\frac {1}{1- \gamma }}$, we have

$$ \frac{B_{n+1}}{B_{n}} \geq \left( \frac{n}{n+1} \right)^{\frac{1}{1-\gamma}} \left( 1 + \frac{C B_{n+1}^{-(1 - \gamma)}}{n+1} \right). $$

(A.1)

The right-hand side of (A.1) is greater than or equal to 1 if and only if

$$ B_{n+1} \leq \left[\frac{n+1}{C} \left( \left( 1+ \frac{1}{n} \right)^{\frac{1}{1-\gamma}} - 1 \right) \right]^{-\frac{1}{1-\gamma}} $$

(A.2)

From the fact that the right-hand side of (A.2) increases as n increases, we deduce that a sufficient condition to satisfy B_n+ 1 ≥ B_n is that

$$ B_{n+1} \leq \left( \frac{C}{2(2^{\frac{1}{1-\gamma}} - 1 )} \right)^{\frac{1}{1-\gamma}}. $$

Then it is straightforward by mathematical induction that

$$ B_{n} \geq \min \left\{ B_{1}, \left( \frac{C}{2(2^{\frac{1}{1-\gamma}} - 1 )} \right)^{\frac{1}{1-\gamma}} \right\}, \quad n \geq 1, $$

which implies the desired result. □

Appendix B. Numerical verification of Claim 4.8

This appendix is devoted to discussions on Claim 4.8. We prove a special case of Claim 4.8 and then present numerical evidences for the other cases. First, we consider the situation when p = 2, i.e., the function f in (1.5) is strongly convex.

Proposition B.1

Claim 4.8 holds when p = 2.

Proof

If p = 2, then (4.1) and (4.8) reduces to (3.6) and (3.7), respectively. Therefore, the desired result can be obtained by the same argument as Theorem 3.8. □

Proposition B.1 means that Claim 4.8 is indeed a generalization of Theorem 3.8 to the case p ≥ 2. In the remainder of this appendix, we assume that p > 2 ≥ q ≥ 1. We show that the following claim is a sufficient condition to ensure Claim 4.8.

Claim B.2

Let {A_n}_n≥ 1 be an increasing sequence of positive real numbers that satisfies

$$ (A_{n+1} - A_{n})^{2} \geq C A_{n}^{1 - \frac{2-q}{q} \left( 1 + \frac{2(p-q)}{p(3q-2)} \right)} \sum\limits_{j=1}^{n} \frac{(A_{j} - A_{j-1})^{\frac{2}{p}}}{A_{j}^{\frac{p-2}{p} \frac{2(p-q)}{p(3q-2)}}}, \quad n \geq 1 $$

(B.1)

for some p > 2 ≥ q ≥ 1 and C > 0. Then we have

$$ A_{n} \geq \widetilde{C} n^{\frac{p(3q-2)}{2(p-q)}}, \quad n \geq 1, $$

where $\widetilde {C}$ is a positive constant depending on p, q, C, and A₁ only.

Remark B.3

Replacing A_n, A_n+ 1 − A_n, and ${\sum }_{j=1}^{n} \cdot $ in (B.1) by y(t), $y^{\prime }(t)$, and ${{\int \limits }_{0}^{t}} \cdot ds$, respectively, one can obtain the ordinary differential equation

$$ y^{\prime}(t)^{2} = C y(t)^{1 - \frac{2-q}{q} \left( 1 + \frac{2(p-q)}{p(3q-2)} \right)} {{\int}_{0}^{t}} \frac{y^{\prime}(s)^{\frac{2}{p}}}{y(s)^{\frac{p-2}{p} \frac{2(p-q)}{p(3q-2)}}} ds, $$

(B.2)

which is a continuous analogue of (B.1). If we impose the initial condition y(0) = 0 to (B.2), then we can readily verify that (B.2) admits a solution

$$ y(t) = \widehat{C} t^{\frac{p(3q-2)}{2(p-q)}}, \quad t \geq 0, $$

where $\widehat {C}$ is an appropriate constant depending on p, q, and C. That is, the solution of (B.2) has the same growth rate as the conclusion of Claim B.2.

Even though Claim B.2 has the rather complex structure, it is in fact a generalization of (A.2). If we set p = 2 and 1 ≤ q < 2 in (B.1), then we get

$$ (A_{n+1} - A_{n}) \geq C^{\frac{1}{2}} A_{n}^{\frac{4q-4}{3q-2}}, \quad n \geq 1, $$

which has the same form as (A.2) (note that $0 \leq \frac {4q-4}{3q-2} < 1$ if 1 ≤ q < 2).

Proposition B.4

Assume that p > 2 ≥ q ≥ 1. Claim B.2 implies Claim 4.8.

Proof

The starting point of the proof is (4.1); (4.4) and (4.8) imply that

$$ \begin{array}{ll} (A_{n+1} - A_{n})^{2} &\geq \frac{A_{n+1} {\sum}_{j=1}^{n} \delta_{j-1}^{\frac{p-2}{p}} \mu^{\frac{2}{p}} (A_{j} - A_{j-1}) }{L_{n+1}} \\ &\geq \frac{C_{\delta}^{\frac{p-2}{p}}C_{\epsilon}^{\frac{2-q}{q}}}{2 \kappa} A_{n+1}^{1 - \frac{2-q}{q} \left( 1 + \frac{2(p-q)}{p(3q-2)} \right)} \sum\limits_{j=1}^{n} \frac{(A_{j} - A_{j-1})^{\frac{2}{p}}}{A_{j}^{\frac{p-2}{p} \frac{2(p-q)}{p(3q-2)}}} \\ &\geq \frac{C_{\delta}^{\frac{p-2}{p}}C_{\epsilon}^{\frac{2-q}{q}}}{2 \kappa} A_{n}^{1 - \frac{2-q}{q} \left( 1 + \frac{2(p-q)}{p(3q-2)} \right)} \sum\limits_{j=1}^{n} \frac{(A_{j} - A_{j-1})^{\frac{2}{p}}}{A_{j}^{\frac{p-2}{p} \frac{2(p-q)}{p(3q-2)}}} , \end{array} $$

where κ was defined in (2.3). Similarly to (3.8), one can verify that A₁ has a lower bound depending on p, q, and L only. Hence, if we assume that Claim B.2 holds, then we can conclude that $A_{n} \geq \widetilde {C}n^{\frac {p(3q-2)}{2(p-q)}}\!,$ where $\widetilde {C}$ is a positive constant depending on p, q, L, μ, C_𝜖, and C_δ. It is straightforward to prove that

$$ F(x_{n}) - F(x^{*}) \leq \frac{1}{2 \widetilde{C}n^{\frac{p(3q-2)}{2(p-q)}}} \left( \|x_{0} - x^{*} \|^{2} + \frac{C}{\widetilde{C}^{\frac{2(p-q)}{p(3q-2)}}} \left( 1 + \log n \right) \right) $$

by invoking Theorem 4.4 and using the same argument as Theorem 3.8. □

We verify Claim B.2 by numerical experiments. Figure 7 plots A_n and $n^{\frac {p(3q-2)}{2(p-q)}}$ with respect to n in log-log scale, where A_n is generated by the recurrence relation

$$ A_{1} =1, \quad (A_{n+1} - A_{n})^{2} = A_{n}^{1 - \frac{2-q}{q} \left( 1 + \frac{2(p-q)}{p(3q-2)} \right)} \sum\limits_{j=1}^{n} \frac{(A_{j} - A_{j-1})^{\frac{2}{p}}}{A_{j}^{\frac{p-2}{p} \frac{2(p-q)}{p(3q-2)}}}, n \geq 1 $$

(B.3)

for various choices of p and q. In all cases, the slope of the graph for A_n is a bit greater than that of the graph for $n^{\frac {p(3q-2)}{2(p-q)}}$. That is, the asymptotic growth rate of A_n is observed to be greater than $n^{\frac {p(3q-2)}{2(p-q)}}$, which implies Claim B.2. Thanks to Proposition B.4, the validity of Claim 4.8 is ensured by the above numerical results.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Park, J. Fast gradient methods for uniformly convex and weakly smooth problems. Adv Comput Math 48, 34 (2022). https://doi.org/10.1007/s10444-022-09943-5

Download citation

Received: 27 September 2021
Accepted: 21 March 2022
Published: 24 May 2022
DOI: https://doi.org/10.1007/s10444-022-09943-5

Keywords

Mathematics Subject Classification (2010)

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fast gradient methods for uniformly convex and weakly smooth problems

Abstract

Access this article

Similar content being viewed by others

Tseng’s extragradient method with double projection for solving pseudomonotone variational inequality problems in Hilbert spaces

The Deep Ritz Method: A Deep Learning-Based Numerical Algorithm for Solving Variational Problems

The Frank-Wolfe Algorithm: A Short Introduction

Availability of data and materials

Code availability

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Competing interests

Additional information

Publisher’s note

Appendices

Appendix A. Recurrence inequalities

Lemma A.1

Proof

Lemma A.2

Proof

Appendix B. Numerical verification of Claim 4.8

Proposition B.1

Proof

Claim B.2

Remark B.3

Proposition B.4

Proof

Rights and permissions

About this article

Cite this article

Keywords

Mathematics Subject Classification (2010)

Navigation

Fast gradient methods for uniformly convex and weakly smooth problems

Abstract

Access this article

Similar content being viewed by others

Tseng’s extragradient method with double projection for solving pseudomonotone variational inequality problems in Hilbert spaces

The Deep Ritz Method: A Deep Learning-Based Numerical Algorithm for Solving Variational Problems

The Frank-Wolfe Algorithm: A Short Introduction

Availability of data and materials

Code availability

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Competing interests

Additional information

Publisher’s note

Appendices

Appendix A. Recurrence inequalities

Lemma A.1

Proof

Lemma A.2

Proof

Appendix B. Numerical verification of Claim 4.8

Proposition B.1

Proof

Claim B.2

Remark B.3

Proposition B.4

Proof

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Mathematics Subject Classification (2010)

Search

Navigation