Abstract
In this paper, acceleration of gradient methods for convex optimization problems with weak levels of convexity and smoothness is considered. Starting from the universal fast gradient method which was designed to be an optimal method for weakly smooth problems whose gradients are Hölder continuous, its momentum is modified appropriately so that it can also accommodate uniformly convex and weakly smooth problems. Different from the existing works, fast gradient methods proposed in this paper do not use the restarting technique but use momentums that are suitably designed to reflect both the uniform convexity and weak smoothness information of the target energy function. Both theoretical and numerical results that support the superiority of the proposed methods are presented.
Similar content being viewed by others
Availability of data and materials
All data generated or analyzed during this study are included in this published article.
Code availability
The source code used to generate all data for the current study is available from the corresponding author upon request.
References
Bauschke, H.H., Combettes, P.L.: Convex analysis and monotone operator theory in Hilbert spaces. Springer, New York (2011)
Nemirovskii, A.S., Nesterov, Y.E.: Optimal methods of smooth convex minimization. USSR Comput. Math. Math. Phys. 25(2), 21–30 (1985)
Park, J.: Additive Schwarz methods for convex optimization as gradient methods. SIAM J. Numer. Anal. 58(3), 1495–1530 (2020)
Roulet, V., d’Aspremont, A.: Sharpness, restart, and acceleration. SIAM J. Optim. 30(1), 262–289 (2020)
Bolte, J., Daniilidis, A., Lewis, A.: The Łojasiewicz inequality for nonsmooth subanalytic functions with applications to subgradient dynamical systems. SIAM J. Optim. 17(4), 1205–1223 (2007)
Xu, Y., Yin, W.: A block coordinate descent method for regularized multiconvex optimization with applications to nonnegative tensor factorization and completion. SIAM J. Imaging Sci. 6(3), 1758–1789 (2013)
Nesterov, Y.: Universal gradient methods for convex optimization problems. Math. Program. 152(1-2), 381–404 (2015)
Chambolle, A., Pock, T.: An introduction to continuous optimization for imaging. Acta Numer. 25, 161–319 (2016)
Nesterov, Y.: Lectures on convex optimization. Springer, Cham (2018)
Azé, D., Penot, J-P: Uniformly convex and uniformly smooth convex functions. In: Annales de la Faculté des sciences de Toulouse: Mathématiques, vol. 4, pp 705–730 (1995)
Xu, Z-B, Roach, G.F.: Characteristic inequalities of uniformly convex and uniformly smooth Banach spaces. J. Math. Anal. Appl. 157(1), 189–210 (1991)
Ciarlet, P.G.: The finite element method for elliptic problems. SIAM, Philadelphia (2002)
Bermejo, R., Infante, J.-A.: A multigrid algorithm for the p-Laplacian. SIAM J. Sci. Comput. 21(5), 1774–1789 (2000)
Feng, W., Salgado, A.J., Wang, C., Wise, S.M.: Preconditioned steepest descent methods for some nonlinear elliptic equations involving p-Laplacian terms. J. Comput. Phys. 334, 45–67 (2017)
Huang, Y.Q., Li, R., Liu, W.: Preconditioned descent algorithms for p-Laplacian. J. Sci. Comput. 32(2), 343–371 (2007)
Zhou, G., Feng, C.: The steepest descent algorithm without line search for p-Laplacian. Appl. Math. Comput. 224, 36–45 (2013)
Nesterov, Y.E.: A method for solving the convex programming problem with convergence rate O(1/k2). Dokl. Akad. Nauk SSSR 269, 543–547 (1983)
Nesterov, Y.: Smooth minimization of non-smooth functions. Math. Program. 103(1), 127–152 (2005)
Beck, A., Teboulle, M.: A fast iterative shrinkage-thresholding algorithm for linear inverse problems. SIAM J. Imaging Sci. 2(1), 183–202 (2009)
Nesterov, Y.: Gradient methods for minimizing composite functions. Math. Program. 140(1), 125–161 (2013)
Devolder, O., Glineur, F., Nesterov, Y.: First-order methods with inexact oracle: the strongly convex case. Technical report, CORE Discussion Paper (2013)
Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Math. Program. 146(1), 37–75 (2014)
Iouditski, A., Nesterov, Y.: Primal-dual subgradient methods for minimizing uniformly convex functions. arXiv:1401.1792 (2014)
Drori, Y., Teboulle, M.: Performance of first-order methods for smooth convex minimization: a novel approach. Math. Program. 145(1), 451–482 (2014)
Kim, D., Fessler, J.A.: Optimized first-order methods for smooth convex minimization. Math, Program. 159(1), 81–107 (2016)
Kim, D., Fessler, J.A.: Generalizing the optimized gradient method for smooth convex minimization. SIAM J. Optim. 28(2), 1920–1950 (2018)
Calatroni, L., Chambolle, A.: Backtracking strategies for accelerated descent methods with smooth composite objectives. SIAM J. Optim. 29(3), 1772–1798 (2019)
ODonoghue, B., Candes, E.: Adaptive restart for accelerated gradient schemes. Found. Comput. Math. 15(3), 715–732 (2015)
Renegar, J., Grimmer, B.: A simple nearly optimal restart scheme for speeding up first-order methods. Found. Comput. Math. https://doi.org/10.1007/s10208-021-09502-2 (2021)
Nesterov, Y., Gasnikov, A., Guminov, S., Dvurechensky, P.: Primal–dual accelerated gradient methods with small-dimensional relaxation oracle. Optim. Methods Softw. 1–38 (2020)
Stonyakin, F., Tyurin, A., Gasnikov, A., Dvurechensky, P., Agafonov, A., Dvinskikh, D., Alkousa, M., Pasechnyuk, D., Artamonov, S., Piskunova, V.: Inexact model: a framework for optimization and variational inequalities. Optim. Methods Softw. 1–47 (2021)
Fercoq, O., Qu, Z.: Adaptive restart of accelerated gradient methods under local quadratic growth condition. IMA J. Numer. Anal. 39(4), 2069–2095 (2019)
Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific J. Math. 16(1), 1–3 (1966)
Love, E.R.: Some logarithm inequalities. Math. Gaz. 64(427), 55–57 (1980)
Park, J.: Pseudo-linear convergence of an additive Schwarz method for dual total variation minimization. Electron. Trans. Numer. Anal. 54, 176–197 (2021)
Acknowledgements
This work was initially started with the help of Professor Donghwan Kim through meetings on acceleration schemes for first-order methods. The author would like to thank him for his insightful comments and assistance.
Funding
This work was supported by the Basic Science Research Program through the National Research Foundation of Korea (NRF) funded by the Ministry of Education (2019R1A6A1A10073887).
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Competing interests
The author declares no competing interests.
Additional information
Communicated by: Stefan Volkwein
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendices
Appendix A. Recurrence inequalities
This appendix presents several useful recurrence inequalities used throughout this paper and their proofs. Motivated by [20, Lemma 8] and [7, Theorem 3], we state the following useful lemma.
Lemma A.1
Let {An}n≥ 1 be an increasing sequence of positive real numbers that satisfies
for some 1 ≤ γ ≤ 2 and C > 0. Then we have
Proof
Take any n ≥ 1. Since 1/2 ≤ 1/γ ≤ 1, the following inequality holds:
It follows that
Now, we have
We get the desired result by applying the above inequality recursively. □
The next lemma is useful when we prove sublinear convergence rate of some fast gradient methods. We note that the proof of Lemma A.2 closely follows [15, Lemma 1].
Lemma A.2
Let {An}n≥ 1 be an increasing sequence of positive real numbers that satisfies
for some 0 ≤ γ < 1 and C > 0. Then we have
Proof
Take any n ≥ 1. Since An+ 1 ≥ An, we get
or equivalently,
Writing \(A_{n} = B_{n} n^{\frac {1}{1- \gamma }}\), we have
The right-hand side of (A.1) is greater than or equal to 1 if and only if
From the fact that the right-hand side of (A.2) increases as n increases, we deduce that a sufficient condition to satisfy Bn+ 1 ≥ Bn is that
Then it is straightforward by mathematical induction that
which implies the desired result. □
Appendix B. Numerical verification of Claim 4.8
This appendix is devoted to discussions on Claim 4.8. We prove a special case of Claim 4.8 and then present numerical evidences for the other cases. First, we consider the situation when p = 2, i.e., the function f in (1.5) is strongly convex.
Proposition B.1
Claim 4.8 holds when p = 2.
Proof
If p = 2, then (4.1) and (4.8) reduces to (3.6) and (3.7), respectively. Therefore, the desired result can be obtained by the same argument as Theorem 3.8. □
Proposition B.1 means that Claim 4.8 is indeed a generalization of Theorem 3.8 to the case p ≥ 2. In the remainder of this appendix, we assume that p > 2 ≥ q ≥ 1. We show that the following claim is a sufficient condition to ensure Claim 4.8.
Claim B.2
Let {An}n≥ 1 be an increasing sequence of positive real numbers that satisfies
for some p > 2 ≥ q ≥ 1 and C > 0. Then we have
where \(\widetilde {C}\) is a positive constant depending on p, q, C, and A1 only.
Remark B.3
Replacing An, An+ 1 − An, and \({\sum }_{j=1}^{n} \cdot \) in (B.1) by y(t), \(y^{\prime }(t)\), and \({{\int \limits }_{0}^{t}} \cdot ds\), respectively, one can obtain the ordinary differential equation
which is a continuous analogue of (B.1). If we impose the initial condition y(0) = 0 to (B.2), then we can readily verify that (B.2) admits a solution
where \(\widehat {C}\) is an appropriate constant depending on p, q, and C. That is, the solution of (B.2) has the same growth rate as the conclusion of Claim B.2.
Even though Claim B.2 has the rather complex structure, it is in fact a generalization of (A.2). If we set p = 2 and 1 ≤ q < 2 in (B.1), then we get
which has the same form as (A.2) (note that \(0 \leq \frac {4q-4}{3q-2} < 1\) if 1 ≤ q < 2).
Proposition B.4
Assume that p > 2 ≥ q ≥ 1. Claim B.2 implies Claim 4.8.
Proof
The starting point of the proof is (4.1); (4.4) and (4.8) imply that
where κ was defined in (2.3). Similarly to (3.8), one can verify that A1 has a lower bound depending on p, q, and L only. Hence, if we assume that Claim B.2 holds, then we can conclude that \(A_{n} \geq \widetilde {C}n^{\frac {p(3q-2)}{2(p-q)}}\!,\) where \(\widetilde {C}\) is a positive constant depending on p, q, L, μ, C𝜖, and Cδ. It is straightforward to prove that
by invoking Theorem 4.4 and using the same argument as Theorem 3.8. □
We verify Claim B.2 by numerical experiments. Figure 7 plots An and \(n^{\frac {p(3q-2)}{2(p-q)}}\) with respect to n in log-log scale, where An is generated by the recurrence relation
for various choices of p and q. In all cases, the slope of the graph for An is a bit greater than that of the graph for \(n^{\frac {p(3q-2)}{2(p-q)}}\). That is, the asymptotic growth rate of An is observed to be greater than \(n^{\frac {p(3q-2)}{2(p-q)}}\), which implies Claim B.2. Thanks to Proposition B.4, the validity of Claim 4.8 is ensured by the above numerical results.
Rights and permissions
About this article
Cite this article
Park, J. Fast gradient methods for uniformly convex and weakly smooth problems. Adv Comput Math 48, 34 (2022). https://doi.org/10.1007/s10444-022-09943-5
Received:
Accepted:
Published:
DOI: https://doi.org/10.1007/s10444-022-09943-5