Abstract
An envelope called an accelerated meta-algorithm is proposed. Based on the envelope, accelerated methods for solving convex unconstrained minimization problems in various formulations can be obtained from nonaccelerated versions in a unified manner. Quasi-optimal algorithms for minimizing smooth functions with Lipschitz continuous derivatives of arbitrary order and for solving smooth minimax problems are given as applications. The proposed envelope is more general than existing ones. Moreover, better convergence estimates can be obtained in the case of this envelope and better efficiency can be achieved in practice for a number of problem formulations.
Similar content being viewed by others
Notes
Here and below, by the proximal envelope, we mean a proximal algorithm. The word “envelope” implies that every iteration of the proximal algorithm involves an internal (auxiliary) optimization problem, which cannot generally be solved analytically and has to be solved numerically. Therefore, the external proximal method can be understood as an “envelope” for the method used to solve the internal problem.
Strictly speaking, it is not a method (algorithm), but rather an envelope (in the sense defined above). In this paper, we call it an accelerated meta-algorithm. The first word explains that the goal of the developed envelope is the acceleration of the method used as a baseline one (for solving the internal problem). However, in contrast to a standard (accelerated) envelope, in the one proposed in this paper, the auxiliary problem is solved analytically in a number of important cases, so it is more appropriate to regard it as a usual algorithm, rather than as an envelope. Accordingly, we chose to use a more neutral word, namely, a meta-algorithm.
REFERENCES
A. V. Gasnikov, Modern Numerical Optimization Methods: Universal Gradient Descent (Mosk. Fiz.-Tekh. Inst., Moscow, 2018) [in Russian].
Yu. Nesterov, Lectures on Convex Optimization (Springer, Berlin, 2018).
G. Lan, Lectures on Optimization: Methods for Machine Learning. https://pwp.gatech.edu/guanghui-lan/publications/
H. Lin, J. Mairal, and Z. Harchaoui, “Catalyst acceleration for first-order convex optimization: From theory to practice,” J. Mach. Learn. Res. 18 (1), 7854–7907 (2017).
N. Doikov and Yu. Nesterov, “Contracting proximal methods for smooth convex optimization.” arXiv:1912.0797
A. Gasnikov, P. Dvurechensky, E. Gorbunov, E. Vorontsova, D. Selikhanovych, C. A. Uribe, B. Jiang, H. Wang, S. Zhang, S. Bubeck, and Q. Jiang, “Near optimal methods for minimizing convex functions with Lipschitz p-th derivatives,” Proceedings of the 32nd Conference on Learning Theory (2019), pp. 1392–1393.
R. D. C. Monteiro and B. F. Svaiter, “An accelerated hybrid proximal extragradient method for convex optimization and its implications to second-order methods,” SIAM J. Optim. 23 (2), 1092–1125 (2013).
Yu. Nesterov, “Inexact accelerated high-order proximal-point methods,” CORE Discussion Paper 2020/8 (2020).
M. Alkousa, D. Dvinskikh, F. Stonyakin, and A. Gasnikov, “Accelerated methods for composite non-bilinear saddle point problem.” arXiv:1906.03620
A. Ivanova, A. Gasnikov, P. Dvurechensky, D. Dvinskikh, A. Tyurin, E. Vorontsova, and D. Pasechnyuk, “Oracle complexity separation in convex optimization,” arXiv:2002.02706.
D. Kamzolov, A. Gasnikov, and P. Dvurechensky, “On the optimal combination of tensor optimization methods.” arXiv:2002.01004
T. Lin, C. Jin, and M. Jordan, “Near-optimal algorithms for minimax optimization.” arXiv:2002.02417
A. Gasnikov, P. Dvurechensky, E. Gorbunov, E. Vorontsova, D. Selikhanovych, and C. A. Uribe, “Optimal tensor methods in smooth convex and uniformly convex optimization,” Proceedings of the 32nd Conference on Learning Theory (2019), pp. 1374–1391.
S. Bubeck, Q. Jiang, Y. T. Lee, Y. Li, and A. Sidford, “Near-optimal method for highly smooth convex optimization,” Proceedings of the 32nd Conference on Learning Theory (2019), pp. 492–507.
B. Jiang, H. Wang, and S. Zhang, “An optimal high-order tensor method for convex optimization,” Proceedings of the 32nd Conference on Learning Theory (2019), pp. 1799–1801.
A. Ivanova, D. Grishchenko, A. Gasnikov, and E. Shulgin, “Adaptive catalyst for smooth convex optimization.” arXiv:1911.11271
Yu. Nesterov, “Implementable tensor methods in unconstrained convex optimization,” Math. Program. (2019). https://doi.org/10.1007/s10107-019-01449-1
D. Kamzolov and A. Gasnikov, “Near-optimal hyperfast second-order method for convex optimization and its sliding.” arXiv:2002.09050
G. N. Grapiglia and Yu. Nesterov, “On inexact solution of auxiliary problems in tensor methods for convex optimization.” arXiv:1907.13023
P. Dvurechensky, A. Gasnikov, and A. Tiurin, “Randomized similar triangles method: A unifying framework for accelerated randomized optimization methods (coordinate descent, directional search, derivative-free method).” arXiv:1707.08486
GitHub https://github.com/dmivilensky/composite-accelerated-method
V. Spokoiny and M. Panov, “Accuracy of Gaussian approximation in nonparametric Bernstein–von Mises theorem.” arXiv:1910.06028. 2019
Yu. Nesterov and S. U. Stich, “Efficiency of the accelerated coordinate descent method on structured optimization problems,” SIAM J. Optim. 27 (1), 110–123 (2017).
D. Dvinskikh, A. Tyurin, A. Gasnikov, and S. Omelchenko, “Accelerated and nonaccelerated stochastic gradient descent with model conception.” arXiv:2001.03443
A. Lucchi and J. Kohler, “A stochastic tensor method for non-convex optimization.” arXiv:1911.10367
M. Baes, “Estimate sequence methods: Extensions and approximations” (Inst. Operat. Res., ETH, Zürich, Switzerland, 2009).
Funding
The work by A. Gasnikov (Section 2) was supported by the Russian Foundation for Basic Research (project no. 18-31-20005 mol_a_ved), Kamzolov’s work (Section 3) was supported by the Russian Foundation for Basic Research (project no. 19-31-90170 Aspiranty), and Dvurechensky’s work (Section 3) was supported by the Russian Foundation for Basic Research (project no. 18-29-03071 mk). Dvinskikh and Matyukhin acknowledge the support of the Ministry of Science and Higher Education of the Russian Federation, state assignment no. 075-00337-20-03, project no. 0714-2020-0005.
Author information
Authors and Affiliations
Corresponding author
Additional information
Translated by I. Ruzanova
Appendices
APPENDIX 1
Below, Theorem 1 is proved relying on the proof in [14] taking into account the addition of a composite function. The following result is based on Theorem 2.1 from [14].
Theorem 3. Let \({{({{y}_{k}})}_{{k \geqslant 1}}}\) be a sequence of points in \({{\mathbb{R}}^{d}}\) and \({{({{\lambda }_{k}})}_{{k \geqslant 1}}}\) be a sequence in \({{\mathbb{R}}_{ + }}\). Define \({{({{a}_{k}})}_{{k \geqslant 1}}}\) such that \({{\lambda }_{k}}{{A}_{k}} = a_{k}^{2}\) and \({{A}_{k}} = \sum\nolimits_{i = 1}^k \,{{a}_{i}}\). For any \(k \geqslant 0\), define
If, for some \(\sigma \in [0,1]\),
then, for any \(x \in {{\mathbb{R}}^{d}}\), we have the inequalities
and
To prove this theorem, we introduce additional lemmas based on Lemmas 2.2–2.5 and 3.1 from [14]; Lemmas 2.6 and 3.3 can be used without modifications.
Lemma 1. Given \({{\psi }_{0}}(x) = \tfrac{1}{2}{{\left\| {x - {{x}_{0}}} \right\|}^{2}}\), define \({{\psi }_{k}}(x) = {{\psi }_{{k - 1}}}(x) + {{a}_{k}}{{\Omega }_{1}}(F,{{y}_{k}},x)\) by induction. Then \({{x}_{k}} = {{x}_{0}} - \sum\nolimits_{i = 1}^k \,{{a}_{i}}(\nabla f({{y}_{i}}) + g{\kern 1pt} '({{y}_{i}}))\) is a minimizer of the function \({{\psi }_{k}}\); moreover, \({{\psi }_{k}}(x) \leqslant {{A}_{k}}F(x) + \tfrac{1}{2}{{\left\| {x - {{x}_{0}}} \right\|}^{2}}\), where \({{A}_{k}} = \sum\nolimits_{i = 1}^k \,{{a}_{i}}\).
Lemma 2. Let \({{z}_{k}}\) be such that
Then, for any \(x\), we have
Proof. Lemma 1 implies that
Lemma 3. For any \(x\), it is true that
Proof. First, simple computations yield
and
thus,
Now we want \({{A}_{{k + 1}}}F({{z}_{{k + 1}}}) - {{A}_{k}}F({{z}_{k}})\) to be a lower bound for inequality (12) when we compute \(x = {{x}_{{k + 1}}}\). Using \({{\Omega }_{1}}(F,{{y}_{{k + 1}}},{{z}_{k}}) \leqslant f({{z}_{k}})\) yields
which completes the proof.
Lemma 4. Let \({{\lambda }_{{k + 1}}}: = \tfrac{{a_{{k + 1}}^{2}}}{{{{A}_{{k + 1}}}}}\) and \(\mathop {\tilde {x}}\nolimits_k : = \tfrac{{{{a}_{{k + 1}}}}}{{{{A}_{{k + 1}}}}}{{x}_{k}} + \tfrac{{{{A}_{k}}}}{{{{A}_{{k + 1}}}}}{{y}_{k}}\). Then
Additionally applying inequality (11) yields
Proof. Using Lemma 3 with \({{z}_{k}} = {{y}_{k}}\) and \(x = {{x}_{{k + 1}}}\), we obtain (for \(\tilde {x}: = \tfrac{{{{a}_{{k + 1}}}}}{{{{A}_{{k + 1}}}}}x + \tfrac{{{{A}_{k}}}}{{{{A}_{{k + 1}}}}}{{y}_{k}}\))
whence
The value of the minimum is easy to compute.
The first inequality in Theorem 3 is proved by combining Lemmas 4 and 2 with Lemma 2.5 from [14]. The second inequality in Theorem 3 follows from Lemmas 4 and 1.
The following lemma proves that the minimization of the Taylor series of order \(p\) for (4) can be represented as an implicit gradient step for a large stepsize.
Lemma 5. Inequality (11) holds at \(\sigma = 1{\text{/}}2\) for (4), which implies that
Proof. The optimality condition yields
It follows that
Using the Taylor series, we find that the gradient of the function satisfies
thus,
where we used (14) in the second inequality and set \(\eta : = {{\lambda }_{{k + 1}}}\tfrac{{{{L}_{p}}{{{\left\| {{{y}_{{k + 1}}} - {{{\tilde {x}}}_{k}}} \right\|}}^{{p - 1}}}}}{{(p - 1)!}}\) in the last equality. The final result is obtained by assuming that \(1{\text{/}}2 \leqslant \eta \leqslant p{\text{/}}(p + 1)\) in (13).
Finally, replacing \(\left\| {x\text{*}} \right\|\) by \(\left\| {{{x}_{0}} - x\text{*}} \right\|\) in Lemma 3.3 and applying Lemma 3.4 from [14], we complete the proof of Theorem 1.
APPENDIX 2
Proof of Theorem 2. Since \(F\) is an \(r\)-uniformly convex function, we obtain
Now the total number of steps in method 1 can be estimated as follows:
Rights and permissions
About this article
Cite this article
Gasnikov, A.V., Dvinskikh, D.M., Dvurechensky, P.E. et al. Accelerated Meta-Algorithm for Convex Optimization Problems. Comput. Math. and Math. Phys. 61, 17–28 (2021). https://doi.org/10.1134/S096554252101005X
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1134/S096554252101005X