Accelerated Meta-Algorithm for Convex Optimization Problems

Gasnikov, A. V.; Dvinskikh, D. M.; Dvurechensky, P. E.; Kamzolov, D. I.; Matyukhin, V. V.; Pasechnyuk, D. A.; Tupitsa, N. K.; Chernov, A. V.

doi:10.1134/S096554252101005X

Accelerated Meta-Algorithm for Convex Optimization Problems

OPTIMAL CONTROL
Published: 28 February 2021

Volume 61, pages 17–28, (2021)
Cite this article

Computational Mathematics and Mathematical Physics Aims and scope Submit manuscript

A. V. Gasnikov^1,2,3,
D. M. Dvinskikh^2,1,3,
P. E. Dvurechensky^3,2,1,
D. I. Kamzolov¹,
V. V. Matyukhin¹,
D. A. Pasechnyuk¹,
N. K. Tupitsa¹ &
…
A. V. Chernov¹

272 Accesses
10 Citations
2 Altmetric
Explore all metrics

Abstract

An envelope called an accelerated meta-algorithm is proposed. Based on the envelope, accelerated methods for solving convex unconstrained minimization problems in various formulations can be obtained from nonaccelerated versions in a unified manner. Quasi-optimal algorithms for minimizing smooth functions with Lipschitz continuous derivatives of arbitrary order and for solving smooth minimax problems are given as applications. The proposed envelope is more general than existing ones. Moreover, better convergence estimates can be obtained in the case of this envelope and better efficiency can be achieved in practice for a number of problem formulations.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

An adaptive accelerated first-order method for convex optimization

Article 11 November 2015

Accelerated Primal-Dual Gradient Descent with Linesearch for Convex, Nonconvex, and Nonsmooth Optimization Problems

Article 01 March 2019

Accelerated first-order methods for large-scale convex optimization: nearly optimal complexity under strong convexity

Article 01 June 2019

Notes

Here and below, by the proximal envelope, we mean a proximal algorithm. The word “envelope” implies that every iteration of the proximal algorithm involves an internal (auxiliary) optimization problem, which cannot generally be solved analytically and has to be solved numerically. Therefore, the external proximal method can be understood as an “envelope” for the method used to solve the internal problem.
Strictly speaking, it is not a method (algorithm), but rather an envelope (in the sense defined above). In this paper, we call it an accelerated meta-algorithm. The first word explains that the goal of the developed envelope is the acceleration of the method used as a baseline one (for solving the internal problem). However, in contrast to a standard (accelerated) envelope, in the one proposed in this paper, the auxiliary problem is solved analytically in a number of important cases, so it is more appropriate to regard it as a usual algorithm, rather than as an envelope. Accordingly, we chose to use a more neutral word, namely, a meta-algorithm.

REFERENCES

A. V. Gasnikov, Modern Numerical Optimization Methods: Universal Gradient Descent (Mosk. Fiz.-Tekh. Inst., Moscow, 2018) [in Russian].
Google Scholar
Yu. Nesterov, Lectures on Convex Optimization (Springer, Berlin, 2018).
Book Google Scholar
G. Lan, Lectures on Optimization: Methods for Machine Learning. https://pwp.gatech.edu/guanghui-lan/publications/
H. Lin, J. Mairal, and Z. Harchaoui, “Catalyst acceleration for first-order convex optimization: From theory to practice,” J. Mach. Learn. Res. 18 (1), 7854–7907 (2017).
MathSciNet MATH Google Scholar
N. Doikov and Yu. Nesterov, “Contracting proximal methods for smooth convex optimization.” arXiv:1912.0797
A. Gasnikov, P. Dvurechensky, E. Gorbunov, E. Vorontsova, D. Selikhanovych, C. A. Uribe, B. Jiang, H. Wang, S. Zhang, S. Bubeck, and Q. Jiang, “Near optimal methods for minimizing convex functions with Lipschitz p-th derivatives,” Proceedings of the 32nd Conference on Learning Theory (2019), pp. 1392–1393.
R. D. C. Monteiro and B. F. Svaiter, “An accelerated hybrid proximal extragradient method for convex optimization and its implications to second-order methods,” SIAM J. Optim. 23 (2), 1092–1125 (2013).
Article MathSciNet Google Scholar
Yu. Nesterov, “Inexact accelerated high-order proximal-point methods,” CORE Discussion Paper 2020/8 (2020).
M. Alkousa, D. Dvinskikh, F. Stonyakin, and A. Gasnikov, “Accelerated methods for composite non-bilinear saddle point problem.” arXiv:1906.03620
A. Ivanova, A. Gasnikov, P. Dvurechensky, D. Dvinskikh, A. Tyurin, E. Vorontsova, and D. Pasechnyuk, “Oracle complexity separation in convex optimization,” arXiv:2002.02706.
D. Kamzolov, A. Gasnikov, and P. Dvurechensky, “On the optimal combination of tensor optimization methods.” arXiv:2002.01004
T. Lin, C. Jin, and M. Jordan, “Near-optimal algorithms for minimax optimization.” arXiv:2002.02417
A. Gasnikov, P. Dvurechensky, E. Gorbunov, E. Vorontsova, D. Selikhanovych, and C. A. Uribe, “Optimal tensor methods in smooth convex and uniformly convex optimization,” Proceedings of the 32nd Conference on Learning Theory (2019), pp. 1374–1391.
S. Bubeck, Q. Jiang, Y. T. Lee, Y. Li, and A. Sidford, “Near-optimal method for highly smooth convex optimization,” Proceedings of the 32nd Conference on Learning Theory (2019), pp. 492–507.
B. Jiang, H. Wang, and S. Zhang, “An optimal high-order tensor method for convex optimization,” Proceedings of the 32nd Conference on Learning Theory (2019), pp. 1799–1801.
A. Ivanova, D. Grishchenko, A. Gasnikov, and E. Shulgin, “Adaptive catalyst for smooth convex optimization.” arXiv:1911.11271
Yu. Nesterov, “Implementable tensor methods in unconstrained convex optimization,” Math. Program. (2019). https://doi.org/10.1007/s10107-019-01449-1
D. Kamzolov and A. Gasnikov, “Near-optimal hyperfast second-order method for convex optimization and its sliding.” arXiv:2002.09050
G. N. Grapiglia and Yu. Nesterov, “On inexact solution of auxiliary problems in tensor methods for convex optimization.” arXiv:1907.13023
P. Dvurechensky, A. Gasnikov, and A. Tiurin, “Randomized similar triangles method: A unifying framework for accelerated randomized optimization methods (coordinate descent, directional search, derivative-free method).” arXiv:1707.08486
GitHub https://github.com/dmivilensky/composite-accelerated-method
V. Spokoiny and M. Panov, “Accuracy of Gaussian approximation in nonparametric Bernstein–von Mises theorem.” arXiv:1910.06028. 2019
Yu. Nesterov and S. U. Stich, “Efficiency of the accelerated coordinate descent method on structured optimization problems,” SIAM J. Optim. 27 (1), 110–123 (2017).
Article MathSciNet Google Scholar
D. Dvinskikh, A. Tyurin, A. Gasnikov, and S. Omelchenko, “Accelerated and nonaccelerated stochastic gradient descent with model conception.” arXiv:2001.03443
A. Lucchi and J. Kohler, “A stochastic tensor method for non-convex optimization.” arXiv:1911.10367
M. Baes, “Estimate sequence methods: Extensions and approximations” (Inst. Operat. Res., ETH, Zürich, Switzerland, 2009).

Download references

Funding

The work by A. Gasnikov (Section 2) was supported by the Russian Foundation for Basic Research (project no. 18-31-20005 mol_a_ved), Kamzolov’s work (Section 3) was supported by the Russian Foundation for Basic Research (project no. 19-31-90170 Aspiranty), and Dvurechensky’s work (Section 3) was supported by the Russian Foundation for Basic Research (project no. 18-29-03071 mk). Dvinskikh and Matyukhin acknowledge the support of the Ministry of Science and Higher Education of the Russian Federation, state assignment no. 075-00337-20-03, project no. 0714-2020-0005.

Author information

Authors and Affiliations

Moscow Institute of Physics and Technology (National Research University), 141701, Dolgoprudny, Moscow oblast, Russia
A. V. Gasnikov, D. M. Dvinskikh, P. E. Dvurechensky, D. I. Kamzolov, V. V. Matyukhin, D. A. Pasechnyuk, N. K. Tupitsa & A. V. Chernov
Kharkevich Institute for Information Transmission Problems, Russian Academy of Sciences, 127051, Moscow, Russia
A. V. Gasnikov, D. M. Dvinskikh & P. E. Dvurechensky
Weierstrass Institute for Applied Analysis and Stochastics, 10117, Berlin, Germany
A. V. Gasnikov, D. M. Dvinskikh & P. E. Dvurechensky

Authors

A. V. Gasnikov
View author publications
You can also search for this author in PubMed Google Scholar
D. M. Dvinskikh
View author publications
You can also search for this author in PubMed Google Scholar
P. E. Dvurechensky
View author publications
You can also search for this author in PubMed Google Scholar
D. I. Kamzolov
View author publications
You can also search for this author in PubMed Google Scholar
V. V. Matyukhin
View author publications
You can also search for this author in PubMed Google Scholar
D. A. Pasechnyuk
View author publications
You can also search for this author in PubMed Google Scholar
N. K. Tupitsa
View author publications
You can also search for this author in PubMed Google Scholar
A. V. Chernov
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to D. I. Kamzolov.

Additional information

Translated by I. Ruzanova

Appendices

APPENDIX 1

Below, Theorem 1 is proved relying on the proof in [14] taking into account the addition of a composite function. The following result is based on Theorem 2.1 from [14].

Theorem 3. Let ${{({{y}_{k}})}_{{k \geqslant 1}}}$ be a sequence of points in ${{\mathbb{R}}^{d}}$ and ${{({{\lambda }_{k}})}_{{k \geqslant 1}}}$ be a sequence in ${{\mathbb{R}}_{ + }}$. Define ${{({{a}_{k}})}_{{k \geqslant 1}}}$ such that ${{\lambda }_{k}}{{A}_{k}} = a_{k}^{2}$ and ${{A}_{k}} = \sum\nolimits_{i = 1}^k \,{{a}_{i}}$. For any $k \geqslant 0$, define

$${{x}_{k}} = {{x}_{0}} - \sum\nolimits_{i = 1}^k \,{{a}_{i}}(\nabla f({{y}_{i}}) + g{\kern 1pt} '({{y}_{i}}))\;\;and\;\;{{\tilde {x}}_{k}}: = \tfrac{{{{a}_{{k + 1}}}}}{{{{A}_{{k + 1}}}}}{{x}_{k}} + \tfrac{{{{A}_{k}}}}{{{{A}_{{k + 1}}}}}{{y}_{k}}.$$

If, for some $\sigma \in [0,1]$,

$$\left\| {{{y}_{{k + 1}}} - ({{{\tilde {x}}}_{k}} - {{\lambda }_{{k + 1}}}\nabla f({{y}_{{k + 1}}}))} \right\| \leqslant \sigma \left\| {{{y}_{{k + 1}}} - {{{\tilde {x}}}_{k}}} \right\|,$$

(11)

then, for any $x \in {{\mathbb{R}}^{d}}$, we have the inequalities

$$F({{y}_{k}}) - F(x) \leqslant \frac{{2{{{\left\| x \right\|}}^{2}}}}{{\mathop {\left( {\sum\limits_{i = 1}^k {\sqrt {{{\lambda }_{i}}} } } \right)}\nolimits^2 }}$$

and

$$\sum\limits_{i = 1}^k \,\frac{{{{A}_{i}}}}{{{{\lambda }_{i}}}}{{\left\| {{{y}_{i}} - {{{\tilde {x}}}_{{i - 1}}}} \right\|}^{2}} \leqslant \frac{{{{{\left\| {x\text{*}} \right\|}}^{2}}}}{{1 - {{\sigma }^{2}}}}.$$

To prove this theorem, we introduce additional lemmas based on Lemmas 2.2–2.5 and 3.1 from [14]; Lemmas 2.6 and 3.3 can be used without modifications.

Lemma 1. Given ${{\psi }_{0}}(x) = \tfrac{1}{2}{{\left\| {x - {{x}_{0}}} \right\|}^{2}}$, define ${{\psi }_{k}}(x) = {{\psi }_{{k - 1}}}(x) + {{a}_{k}}{{\Omega }_{1}}(F,{{y}_{k}},x)$ by induction. Then ${{x}_{k}} = {{x}_{0}} - \sum\nolimits_{i = 1}^k \,{{a}_{i}}(\nabla f({{y}_{i}}) + g{\kern 1pt} '({{y}_{i}}))$ is a minimizer of the function ${{\psi }_{k}}$; moreover, ${{\psi }_{k}}(x) \leqslant {{A}_{k}}F(x) + \tfrac{1}{2}{{\left\| {x - {{x}_{0}}} \right\|}^{2}}$, where ${{A}_{k}} = \sum\nolimits_{i = 1}^k \,{{a}_{i}}$.

Lemma 2. Let ${{z}_{k}}$ be such that

$${{\psi }_{k}}({{x}_{k}}) - {{A}_{k}}F({{z}_{k}}) \geqslant 0.$$

Then, for any $x$, we have

$$F({{z}_{k}}) \leqslant F(x) + \frac{{{{{\left\| {x - {{x}_{0}}} \right\|}}^{2}}}}{{2{{A}_{k}}}}.$$

Proof. Lemma 1 implies that

$${{A}_{k}}F({{z}_{k}}) \leqslant {{\psi }_{k}}({{x}_{k}}) \leqslant {{\psi }_{k}}(x) \leqslant {{A}_{k}}F(x) + \frac{1}{2}{{\left\| {x - {{x}_{0}}} \right\|}^{2}}.$$

Lemma 3. For any $x$, it is true that

$$\begin{gathered} {{\psi }_{{k + 1}}}(x) - {{A}_{{k + 1}}}F({{y}_{{k + 1}}}) - ({{\psi }_{k}}({{x}_{k}}) - {{A}_{k}}F({{z}_{k}})) \\ \geqslant {{A}_{{k + 1}}}(\nabla f({{y}_{{k + 1}}}) + g{\kern 1pt} '({{y}_{{k + 1}}}))\left( {\frac{{{{a}_{{k + 1}}}}}{{{{A}_{{k + 1}}}}}x + \frac{{{{A}_{k}}}}{{{{A}_{{k + 1}}}}}{{z}_{k}} - {{y}_{{k + 1}}}} \right) + \frac{1}{2}{{\left\| {x - {{x}_{k}}} \right\|}^{2}}. \\ \end{gathered} $$

Proof. First, simple computations yield

$${{\psi }_{k}}(x) = {{\psi }_{k}}({{x}_{k}}) + \frac{1}{2}{{\left\| {x - {{x}_{k}}} \right\|}^{2}}$$

and

$${{\psi }_{{k + 1}}}(x) = {{\psi }_{k}}({{x}_{k}}) + \frac{1}{2}{{\left\| {x - {{x}_{k}}} \right\|}^{2}} + {{a}_{{k + 1}}}{{\Omega }_{1}}(f,{{y}_{{k + 1}}},x);$$

thus,

$${{\psi }_{{k + 1}}}(x) - {{\psi }_{k}}({{x}_{k}}) = {{a}_{{k + 1}}}{{\Omega }_{1}}(F,{{y}_{{k + 1}}},x) + \frac{1}{2}{{\left\| {x - {{x}_{k}}} \right\|}^{2}}.$$

(12)

Now we want ${{A}_{{k + 1}}}F({{z}_{{k + 1}}}) - {{A}_{k}}F({{z}_{k}})$ to be a lower bound for inequality (12) when we compute $x = {{x}_{{k + 1}}}$. Using ${{\Omega }_{1}}(F,{{y}_{{k + 1}}},{{z}_{k}}) \leqslant f({{z}_{k}})$ yields

$$\begin{gathered} {{a}_{{k + 1}}}{{\Omega }_{1}}(F,{{y}_{{k + 1}}},x) = {{A}_{{k + 1}}}{{\Omega }_{1}}(F,{{y}_{{k + 1}}},x) - {{A}_{k}}{{\Omega }_{1}}(F,{{y}_{{k + 1}}},x) = {{A}_{{k + 1}}}{{\Omega }_{1}}(F,{{y}_{{k + 1}}},x) - {{A}_{k}}\nabla F({{y}_{{k + 1}}})(x - {{z}_{k}}) \\ - \;{{A}_{k}}{{\Omega }_{1}}(F,{{y}_{{k + 1}}},{{z}_{k}}) = {{A}_{{k + 1}}}{{\Omega }_{1}}\left( {F,{{y}_{{k + 1}}},x - \frac{{{{A}_{k}}}}{{{{A}_{{k + 1}}}}}(x - {{z}_{k}})} \right) - {{A}_{k}}{{\Omega }_{1}}(F,{{y}_{{k + 1}}},{{z}_{k}}) \geqslant {{A}_{{k + 1}}}F({{y}_{{k + 1}}}) - {{A}_{k}}F({{z}_{k}}) \\ \, + {{A}_{{k + 1}}}(\nabla f({{y}_{{k + 1}}}) + g'({{y}_{{k + 1}}}))\left( {\frac{{{{a}_{{k + 1}}}}}{{{{A}_{{k + 1}}}}}x + \frac{{{{A}_{k}}}}{{{{A}_{{k + 1}}}}}{{z}_{k}} - {{y}_{{k + 1}}}} \right), \\ \end{gathered} $$

which completes the proof.

Lemma 4. Let ${{\lambda }_{{k + 1}}}: = \tfrac{{a_{{k + 1}}^{2}}}{{{{A}_{{k + 1}}}}}$ and $\mathop {\tilde {x}}\nolimits_k : = \tfrac{{{{a}_{{k + 1}}}}}{{{{A}_{{k + 1}}}}}{{x}_{k}} + \tfrac{{{{A}_{k}}}}{{{{A}_{{k + 1}}}}}{{y}_{k}}$. Then

$$\begin{gathered} {{\psi }_{{k + 1}}}({{x}_{{k + 1}}}) - {{A}_{{k + 1}}}F({{y}_{{k + 1}}}) - ({{\psi }_{k}}({{x}_{k}}) - {{A}_{k}}F({{y}_{k}})) \\ \geqslant \frac{{{{A}_{{k + 1}}}}}{{2{{\lambda }_{{k + 1}}}}}({{\left\| {{{y}_{{k + 1}}} - {{{\tilde {x}}}_{k}}} \right\|}^{2}} - {{\left\| {{{y}_{{k + 1}}} - ({{{\tilde {x}}}_{k}} - {{\lambda }_{{k + 1}}}(\nabla f({{y}_{{k + 1}}})) + g{\kern 1pt} '({{y}_{{k + 1}}}))} \right\|}^{2}}). \\ \end{gathered} $$

Additionally applying inequality (11) yields

$${{\psi }_{k}}({{x}_{k}}) - {{A}_{k}}F({{y}_{k}}) \geqslant \frac{{1 - {{\sigma }^{2}}}}{2}\sum\limits_{i = 1}^k \,\frac{{{{A}_{i}}}}{{{{\lambda }_{i}}}}{{\left\| {{{y}_{i}} - {{{\tilde {x}}}_{{i - 1}}}} \right\|}^{2}}.$$

Proof. Using Lemma 3 with ${{z}_{k}} = {{y}_{k}}$ and $x = {{x}_{{k + 1}}}$, we obtain (for $\tilde {x}: = \tfrac{{{{a}_{{k + 1}}}}}{{{{A}_{{k + 1}}}}}x + \tfrac{{{{A}_{k}}}}{{{{A}_{{k + 1}}}}}{{y}_{k}}$)

$$\begin{gathered} (\nabla f({{y}_{{k + 1}}}) + g{\kern 1pt} '({{y}_{{k + 1}}}))\left( {\frac{{{{a}_{{k + 1}}}}}{{{{A}_{{k + 1}}}}}x + \frac{{{{A}_{k}}}}{{{{A}_{{k + 1}}}}}{{y}_{k}} - {{y}_{{k + 1}}}} \right) + \frac{1}{{2{{A}_{{k + 1}}}}}{{\left\| {x - {{x}_{k}}} \right\|}^{2}} = (\nabla f({{y}_{{k + 1}}}) + g{\kern 1pt} '({{y}_{{k + 1}}}))(\tilde {x} - {{y}_{{k + 1}}}) \\ + \;\frac{1}{{2{{A}_{{k + 1}}}}}{{\left\| {\frac{{{{A}_{{k + 1}}}}}{{{{a}_{{k + 1}}}}}\left( {\tilde {x} - \frac{{{{A}_{k}}}}{{{{A}_{{k + 1}}}}}{{y}_{k}}} \right) - {{x}_{k}}} \right\|}^{2}} = (\nabla f({{y}_{{k + 1}}}) + g{\kern 1pt} '({{y}_{{k + 1}}}))(\tilde {x} - {{y}_{{k + 1}}}) + \frac{{{{A}_{{k + 1}}}}}{{2a_{{k + 1}}^{2}}}{{\left\| {\tilde {x} - \left( {\frac{{{{a}_{{k + 1}}}}}{{{{A}_{k}}}}{{x}_{k}} + \frac{{{{A}_{k}}}}{{{{A}_{{k + 1}}}}}{{y}_{k}}} \right)} \right\|}^{2}}, \\ \end{gathered} $$

whence

$$\begin{gathered} {{\psi }_{{k + 1}}}({{x}_{{k + 1}}}) - {{A}_{{k + 1}}}F({{y}_{{k + 1}}}) - ({{\psi }_{k}}({{x}_{k}}) - {{A}_{k}}F({{y}_{k}})) \\ \geqslant {{A}_{{k + 1}}}\mathop {\min}\limits_{x \in {{\mathbb{R}}^{d}}} \left\{ {(\nabla f({{y}_{{k + 1}}}) + g{\kern 1pt} '({{y}_{{k + 1}}}))(x - {{y}_{{k + 1}}}) + \frac{1}{{2{{\lambda }_{{k + 1}}}}}{{{\left\| {x - {{{\tilde {x}}}_{k}}} \right\|}}^{2}}} \right\}. \\ \end{gathered} $$

The value of the minimum is easy to compute.

The first inequality in Theorem 3 is proved by combining Lemmas 4 and 2 with Lemma 2.5 from [14]. The second inequality in Theorem 3 follows from Lemmas 4 and 1.

The following lemma proves that the minimization of the Taylor series of order $p$ for (4) can be represented as an implicit gradient step for a large stepsize.

Lemma 5. Inequality (11) holds at $\sigma = 1{\text{/}}2$ for (4), which implies that

$$\frac{1}{2} \leqslant {{\lambda }_{{k + 1}}}\frac{{{{L}_{p}}{{{\left\| {{{y}_{{k + 1}}} - {{{\tilde {x}}}_{k}}} \right\|}}^{{p - 1}}}}}{{(p - 1)!}} \leqslant \frac{p}{{p + 1}}.$$

(13)

Proof. The optimality condition yields

$${{\nabla }_{y}}{{f}_{p}}({{y}_{{k + 1}}},{{\tilde {x}}_{k}}) + \frac{{{{L}_{p}}(p + 1)}}{{p!}}({{y}_{{k + 1}}} - {{\tilde {x}}_{k}}){{\left\| {{{y}_{{k + 1}}} - {{{\tilde {x}}}_{k}}} \right\|}^{{p - 1}}} + g{\kern 1pt} '({{y}_{{k + 1}}}) = 0.$$

(14)

It follows that

$$\begin{gathered} {{y}_{{k + 1}}} - ({{{\tilde {x}}}_{k}} - {{\lambda }_{{k + 1}}}(\nabla f({{y}_{{k + 1}}}) + g{\kern 1pt} '({{y}_{{k + 1}}}))) = {{\lambda }_{{k + 1}}}(\nabla f({{y}_{{k + 1}}}) + g{\kern 1pt} '({{y}_{{k + 1}}})) \\ \, - \frac{{p!}}{{{{L}_{p}}(p + 1){{{\left\| {{{y}_{{k + 1}}} - {{{\tilde {x}}}_{k}}} \right\|}}^{{p - 1}}}}}({{\nabla }_{y}}{{f}_{p}}({{y}_{{k + 1}}},{{{\tilde {x}}}_{k}}) + g{\kern 1pt} '({{y}_{{k + 1}}})). \\ \end{gathered} $$

Using the Taylor series, we find that the gradient of the function satisfies

$$\left\| {\nabla f(y) - {{\nabla }_{y}}{{f}_{p}}(y,x)} \right\| \leqslant \frac{{{{L}_{p}}}}{{p!}}{{\left\| {y - x} \right\|}^{p}};$$

thus,

$$\begin{gathered} \left\| {{{y}_{{k + 1}}} - ({{{\tilde {x}}}_{k}} - {{\lambda }_{{k + 1}}}(\nabla f({{y}_{{k + 1}}}) + g{\kern 1pt} '({{y}_{{k + 1}}})))} \right\| \leqslant {{\lambda }_{{k + 1}}}\frac{{{{L}_{p}}}}{{p!}}{{\left\| {{{y}_{{k + 1}}} - {{{\tilde {x}}}_{k}}} \right\|}^{p}} \\ + \;\left| {{{\lambda }_{{k + 1}}} - \frac{{p!}}{{{{L}_{p}}(p + 1){{{\left\| {{{y}_{{k + 1}}} - {{{\tilde {x}}}_{k}}} \right\|}}^{{p - 1}}}}}} \right|\left\| {{{\nabla }_{y}}{{f}_{p}}({{y}_{{k + 1}}},{{{\tilde {x}}}_{k}}) + g{\kern 1pt} '({{y}_{{k + 1}}})} \right\| \\ \leqslant \left\| {{{y}_{{k + 1}}} - {{{\tilde {x}}}_{k}}} \right\|\left( {{{\lambda }_{{k + 1}}}\frac{{{{L}_{p}}}}{{p!}}{{{\left\| {{{y}_{{k + 1}}} - {{{\tilde {x}}}_{k}}} \right\|}}^{{p - 1}}} + \left| {{{\lambda }_{{k + 1}}}\frac{{{{L}_{p}}(p + 1){{{\left\| {{{y}_{{k + 1}}} - {{{\tilde {x}}}_{k}}} \right\|}}^{{p - 1}}}}}{{p!}} - 1{\kern 1pt} } \right|} \right) \\ \, = \left\| {{{y}_{{k + 1}}} - {{{\tilde {x}}}_{k}}} \right\|\left( {\frac{\eta }{p} + \left| {\eta \frac{{p + 1}}{p} - 1} \right|} \right), \\ \end{gathered} $$

where we used (14) in the second inequality and set $\eta : = {{\lambda }_{{k + 1}}}\tfrac{{{{L}_{p}}{{{\left\| {{{y}_{{k + 1}}} - {{{\tilde {x}}}_{k}}} \right\|}}^{{p - 1}}}}}{{(p - 1)!}}$ in the last equality. The final result is obtained by assuming that $1{\text{/}}2 \leqslant \eta \leqslant p{\text{/}}(p + 1)$ in (13).

Finally, replacing $\left\| {x\text{*}} \right\|$ by $\left\| {{{x}_{0}} - x\text{*}} \right\|$ in Lemma 3.3 and applying Lemma 3.4 from [14], we complete the proof of Theorem 1.

APPENDIX 2

Proof of Theorem 2. Since $F$ is an $r$-uniformly convex function, we obtain

$${{R}_{{k + 1}}} = \left\| {{{z}_{{k + 1}}} - {{x}_{{{*}}}}} \right\| \leqslant {{\left( {\frac{{r(F({{z}_{{k + 1}}}) - F({{x}_{{{*}}}}))}}{{{{\sigma }_{r}}}}} \right)}^{{1/r}}}\;\mathop \leqslant \limits^{(5)} \;{{\left( {\frac{{r\left( {\frac{{{{c}_{p}}{{L}_{p}}R_{k}^{{p + 1}}}}{{N_{k}^{{\tfrac{{3p + 1}}{2}}}}}} \right)}}{{{{\sigma }_{r}}}}} \right)}^{{1/r}}} = {{\left( {\frac{{r{{c}_{p}}{{L}_{p}}R_{k}^{{p + 1}}}}{{{{\sigma }_{r}}N_{k}^{{\tfrac{{3p + 1}}{2}}}}}} \right)}^{{1/r}}}\;\mathop \leqslant \limits^{(9)} \;{{\left( {\frac{{R_{k}^{{p + 1}}}}{{{{2}^{r}}R_{k}^{{p + 1 - r}}}}} \right)}^{{1/r}}} = \frac{{{{R}_{k}}}}{2}.$$

Now the total number of steps in method 1 can be estimated as follows:

$$\begin{gathered} \sum\limits_{k = 0}^K \,{{N}_{k}} \leqslant \sum\limits_{k = 0}^K \,{{\left( {\frac{{r{{c}_{p}}{{L}_{p}}{{2}^{r}}}}{{{{\sigma }_{r}}}}R_{k}^{{p + 1 - r}}} \right)}^{{\tfrac{2}{{3p + 1}}}}} + K = \sum\limits_{k = 0}^K \,{{\left( {\frac{{r{{c}_{p}}{{L}_{p}}{{2}^{r}}}}{{{{\sigma }_{r}}}}{{{({{R}_{0}}{{2}^{{ - k}}})}}^{{p + 1 - r}}}} \right)}^{{\tfrac{2}{{3p + 1}}}}} + K \\ \, = {{\left( {\frac{{r{{c}_{p}}{{L}_{p}}{{2}^{r}}R_{0}^{{p + 1 - r}}}}{{{{\sigma }_{r}}}}} \right)}^{{\tfrac{2}{{3p + 1}}}}}\sum\limits_{k = 0}^K \,{{2}^{{\tfrac{{ - 2(p + 1 - r)k}}{{3p + 1}}}}} + K. \\ \end{gathered} $$

Rights and permissions

Reprints and permissions

About this article

Cite this article

Gasnikov, A.V., Dvinskikh, D.M., Dvurechensky, P.E. et al. Accelerated Meta-Algorithm for Convex Optimization Problems. Comput. Math. and Math. Phys. 61, 17–28 (2021). https://doi.org/10.1134/S096554252101005X

Download citation

Received: 18 April 2020
Revised: 16 June 2020
Accepted: 18 September 2020
Published: 28 February 2021
Issue Date: January 2021
DOI: https://doi.org/10.1134/S096554252101005X

Keywords:

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions