Skip to main content
Log in

Accelerated Meta-Algorithm for Convex Optimization Problems

  • OPTIMAL CONTROL
  • Published:
Computational Mathematics and Mathematical Physics Aims and scope Submit manuscript

Abstract

An envelope called an accelerated meta-algorithm is proposed. Based on the envelope, accelerated methods for solving convex unconstrained minimization problems in various formulations can be obtained from nonaccelerated versions in a unified manner. Quasi-optimal algorithms for minimizing smooth functions with Lipschitz continuous derivatives of arbitrary order and for solving smooth minimax problems are given as applications. The proposed envelope is more general than existing ones. Moreover, better convergence estimates can be obtained in the case of this envelope and better efficiency can be achieved in practice for a number of problem formulations.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1.
Fig. 2.

Similar content being viewed by others

Notes

  1. Here and below, by the proximal envelope, we mean a proximal algorithm. The word “envelope” implies that every iteration of the proximal algorithm involves an internal (auxiliary) optimization problem, which cannot generally be solved analytically and has to be solved numerically. Therefore, the external proximal method can be understood as an “envelope” for the method used to solve the internal problem.

  2. Strictly speaking, it is not a method (algorithm), but rather an envelope (in the sense defined above). In this paper, we call it an accelerated meta-algorithm. The first word explains that the goal of the developed envelope is the acceleration of the method used as a baseline one (for solving the internal problem). However, in contrast to a standard (accelerated) envelope, in the one proposed in this paper, the auxiliary problem is solved analytically in a number of important cases, so it is more appropriate to regard it as a usual algorithm, rather than as an envelope. Accordingly, we chose to use a more neutral word, namely, a meta-algorithm.

REFERENCES

  1. A. V. Gasnikov, Modern Numerical Optimization Methods: Universal Gradient Descent (Mosk. Fiz.-Tekh. Inst., Moscow, 2018) [in Russian].

    Google Scholar 

  2. Yu. Nesterov, Lectures on Convex Optimization (Springer, Berlin, 2018).

    Book  Google Scholar 

  3. G. Lan, Lectures on Optimization: Methods for Machine Learning. https://pwp.gatech.edu/guanghui-lan/publications/

  4. H. Lin, J. Mairal, and Z. Harchaoui, “Catalyst acceleration for first-order convex optimization: From theory to practice,” J. Mach. Learn. Res. 18 (1), 7854–7907 (2017).

    MathSciNet  MATH  Google Scholar 

  5. N. Doikov and Yu. Nesterov, “Contracting proximal methods for smooth convex optimization.” arXiv:1912.0797

  6. A. Gasnikov, P. Dvurechensky, E. Gorbunov, E. Vorontsova, D. Selikhanovych, C. A. Uribe, B. Jiang, H. Wang, S. Zhang, S. Bubeck, and Q. Jiang, “Near optimal methods for minimizing convex functions with Lipschitz p-th derivatives,” Proceedings of the 32nd Conference on Learning Theory (2019), pp. 1392–1393.

  7. R. D. C. Monteiro and B. F. Svaiter, “An accelerated hybrid proximal extragradient method for convex optimization and its implications to second-order methods,” SIAM J. Optim. 23 (2), 1092–1125 (2013).

    Article  MathSciNet  Google Scholar 

  8. Yu. Nesterov, “Inexact accelerated high-order proximal-point methods,” CORE Discussion Paper 2020/8 (2020).

  9. M. Alkousa, D. Dvinskikh, F. Stonyakin, and A. Gasnikov, “Accelerated methods for composite non-bilinear saddle point problem.” arXiv:1906.03620

  10. A. Ivanova, A. Gasnikov, P. Dvurechensky, D. Dvinskikh, A. Tyurin, E. Vorontsova, and D. Pasechnyuk, “Oracle complexity separation in convex optimization,” arXiv:2002.02706.

  11. D. Kamzolov, A. Gasnikov, and P. Dvurechensky, “On the optimal combination of tensor optimization methods.” arXiv:2002.01004

  12. T. Lin, C. Jin, and M. Jordan, “Near-optimal algorithms for minimax optimization.” arXiv:2002.02417

  13. A. Gasnikov, P. Dvurechensky, E. Gorbunov, E. Vorontsova, D. Selikhanovych, and C. A. Uribe, “Optimal tensor methods in smooth convex and uniformly convex optimization,” Proceedings of the 32nd Conference on Learning Theory (2019), pp. 1374–1391.

  14. S. Bubeck, Q. Jiang, Y. T. Lee, Y. Li, and A. Sidford, “Near-optimal method for highly smooth convex optimization,” Proceedings of the 32nd Conference on Learning Theory (2019), pp. 492–507.

  15. B. Jiang, H. Wang, and S. Zhang, “An optimal high-order tensor method for convex optimization,” Proceedings of the 32nd Conference on Learning Theory (2019), pp. 1799–1801.

  16. A. Ivanova, D. Grishchenko, A. Gasnikov, and E. Shulgin, “Adaptive catalyst for smooth convex optimization.” arXiv:1911.11271

  17. Yu. Nesterov, “Implementable tensor methods in unconstrained convex optimization,” Math. Program. (2019). https://doi.org/10.1007/s10107-019-01449-1

  18. D. Kamzolov and A. Gasnikov, “Near-optimal hyperfast second-order method for convex optimization and its sliding.” arXiv:2002.09050

  19. G. N. Grapiglia and Yu. Nesterov, “On inexact solution of auxiliary problems in tensor methods for convex optimization.” arXiv:1907.13023

  20. P. Dvurechensky, A. Gasnikov, and A. Tiurin, “Randomized similar triangles method: A unifying framework for accelerated randomized optimization methods (coordinate descent, directional search, derivative-free method).” arXiv:1707.08486

  21. GitHub https://github.com/dmivilensky/composite-accelerated-method

  22. V. Spokoiny and M. Panov, “Accuracy of Gaussian approximation in nonparametric Bernstein–von Mises theorem.” arXiv:1910.06028. 2019

  23. Yu. Nesterov and S. U. Stich, “Efficiency of the accelerated coordinate descent method on structured optimization problems,” SIAM J. Optim. 27 (1), 110–123 (2017).

    Article  MathSciNet  Google Scholar 

  24. D. Dvinskikh, A. Tyurin, A. Gasnikov, and S. Omelchenko, “Accelerated and nonaccelerated stochastic gradient descent with model conception.” arXiv:2001.03443

  25. A. Lucchi and J. Kohler, “A stochastic tensor method for non-convex optimization.” arXiv:1911.10367

  26. M. Baes, “Estimate sequence methods: Extensions and approximations” (Inst. Operat. Res., ETH, Zürich, Switzerland, 2009).

Download references

Funding

The work by A. Gasnikov (Section 2) was supported by the Russian Foundation for Basic Research (project no. 18-31-20005 mol_a_ved), Kamzolov’s work (Section 3) was supported by the Russian Foundation for Basic Research (project no. 19-31-90170 Aspiranty), and Dvurechensky’s work (Section 3) was supported by the Russian Foundation for Basic Research (project no. 18-29-03071 mk). Dvinskikh and Matyukhin acknowledge the support of the Ministry of Science and Higher Education of the Russian Federation, state assignment no. 075-00337-20-03, project no. 0714-2020-0005.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to D. I. Kamzolov.

Additional information

Translated by I. Ruzanova

Appendices

APPENDIX 1

Below, Theorem 1 is proved relying on the proof in [14] taking into account the addition of a composite function. The following result is based on Theorem 2.1 from [14].

Theorem 3. Let \({{({{y}_{k}})}_{{k \geqslant 1}}}\) be a sequence of points in \({{\mathbb{R}}^{d}}\) and \({{({{\lambda }_{k}})}_{{k \geqslant 1}}}\) be a sequence in \({{\mathbb{R}}_{ + }}\). Define \({{({{a}_{k}})}_{{k \geqslant 1}}}\) such   that \({{\lambda }_{k}}{{A}_{k}} = a_{k}^{2}\) and \({{A}_{k}} = \sum\nolimits_{i = 1}^k \,{{a}_{i}}\). For any \(k \geqslant 0\), define

$${{x}_{k}} = {{x}_{0}} - \sum\nolimits_{i = 1}^k \,{{a}_{i}}(\nabla f({{y}_{i}}) + g{\kern 1pt} '({{y}_{i}}))\;\;and\;\;{{\tilde {x}}_{k}}: = \tfrac{{{{a}_{{k + 1}}}}}{{{{A}_{{k + 1}}}}}{{x}_{k}} + \tfrac{{{{A}_{k}}}}{{{{A}_{{k + 1}}}}}{{y}_{k}}.$$

If, for some \(\sigma \in [0,1]\),

$$\left\| {{{y}_{{k + 1}}} - ({{{\tilde {x}}}_{k}} - {{\lambda }_{{k + 1}}}\nabla f({{y}_{{k + 1}}}))} \right\| \leqslant \sigma \left\| {{{y}_{{k + 1}}} - {{{\tilde {x}}}_{k}}} \right\|,$$
(11)

then, for any \(x \in {{\mathbb{R}}^{d}}\), we have the inequalities

$$F({{y}_{k}}) - F(x) \leqslant \frac{{2{{{\left\| x \right\|}}^{2}}}}{{\mathop {\left( {\sum\limits_{i = 1}^k {\sqrt {{{\lambda }_{i}}} } } \right)}\nolimits^2 }}$$

and

$$\sum\limits_{i = 1}^k \,\frac{{{{A}_{i}}}}{{{{\lambda }_{i}}}}{{\left\| {{{y}_{i}} - {{{\tilde {x}}}_{{i - 1}}}} \right\|}^{2}} \leqslant \frac{{{{{\left\| {x\text{*}} \right\|}}^{2}}}}{{1 - {{\sigma }^{2}}}}.$$

To prove this theorem, we introduce additional lemmas based on Lemmas 2.2–2.5 and 3.1 from [14]; Lemmas 2.6 and 3.3 can be used without modifications.

Lemma 1. Given \({{\psi }_{0}}(x) = \tfrac{1}{2}{{\left\| {x - {{x}_{0}}} \right\|}^{2}}\), define \({{\psi }_{k}}(x) = {{\psi }_{{k - 1}}}(x) + {{a}_{k}}{{\Omega }_{1}}(F,{{y}_{k}},x)\) by induction. Then    \({{x}_{k}} = {{x}_{0}} - \sum\nolimits_{i = 1}^k \,{{a}_{i}}(\nabla f({{y}_{i}}) + g{\kern 1pt} '({{y}_{i}}))\) is a minimizer of the function \({{\psi }_{k}}\); moreover, \({{\psi }_{k}}(x) \leqslant {{A}_{k}}F(x) + \tfrac{1}{2}{{\left\| {x - {{x}_{0}}} \right\|}^{2}}\), where \({{A}_{k}} = \sum\nolimits_{i = 1}^k \,{{a}_{i}}\).

Lemma 2. Let \({{z}_{k}}\) be such that

$${{\psi }_{k}}({{x}_{k}}) - {{A}_{k}}F({{z}_{k}}) \geqslant 0.$$

Then, for any \(x\), we have

$$F({{z}_{k}}) \leqslant F(x) + \frac{{{{{\left\| {x - {{x}_{0}}} \right\|}}^{2}}}}{{2{{A}_{k}}}}.$$

Proof. Lemma 1 implies that

$${{A}_{k}}F({{z}_{k}}) \leqslant {{\psi }_{k}}({{x}_{k}}) \leqslant {{\psi }_{k}}(x) \leqslant {{A}_{k}}F(x) + \frac{1}{2}{{\left\| {x - {{x}_{0}}} \right\|}^{2}}.$$

Lemma 3. For any \(x\), it is true that

$$\begin{gathered} {{\psi }_{{k + 1}}}(x) - {{A}_{{k + 1}}}F({{y}_{{k + 1}}}) - ({{\psi }_{k}}({{x}_{k}}) - {{A}_{k}}F({{z}_{k}})) \\ \geqslant {{A}_{{k + 1}}}(\nabla f({{y}_{{k + 1}}}) + g{\kern 1pt} '({{y}_{{k + 1}}}))\left( {\frac{{{{a}_{{k + 1}}}}}{{{{A}_{{k + 1}}}}}x + \frac{{{{A}_{k}}}}{{{{A}_{{k + 1}}}}}{{z}_{k}} - {{y}_{{k + 1}}}} \right) + \frac{1}{2}{{\left\| {x - {{x}_{k}}} \right\|}^{2}}. \\ \end{gathered} $$

Proof. First, simple computations yield

$${{\psi }_{k}}(x) = {{\psi }_{k}}({{x}_{k}}) + \frac{1}{2}{{\left\| {x - {{x}_{k}}} \right\|}^{2}}$$

and

$${{\psi }_{{k + 1}}}(x) = {{\psi }_{k}}({{x}_{k}}) + \frac{1}{2}{{\left\| {x - {{x}_{k}}} \right\|}^{2}} + {{a}_{{k + 1}}}{{\Omega }_{1}}(f,{{y}_{{k + 1}}},x);$$

thus,

$${{\psi }_{{k + 1}}}(x) - {{\psi }_{k}}({{x}_{k}}) = {{a}_{{k + 1}}}{{\Omega }_{1}}(F,{{y}_{{k + 1}}},x) + \frac{1}{2}{{\left\| {x - {{x}_{k}}} \right\|}^{2}}.$$
(12)

Now we want \({{A}_{{k + 1}}}F({{z}_{{k + 1}}}) - {{A}_{k}}F({{z}_{k}})\) to be a lower bound for inequality (12) when we compute \(x = {{x}_{{k + 1}}}\). Using \({{\Omega }_{1}}(F,{{y}_{{k + 1}}},{{z}_{k}}) \leqslant f({{z}_{k}})\) yields

$$\begin{gathered} {{a}_{{k + 1}}}{{\Omega }_{1}}(F,{{y}_{{k + 1}}},x) = {{A}_{{k + 1}}}{{\Omega }_{1}}(F,{{y}_{{k + 1}}},x) - {{A}_{k}}{{\Omega }_{1}}(F,{{y}_{{k + 1}}},x) = {{A}_{{k + 1}}}{{\Omega }_{1}}(F,{{y}_{{k + 1}}},x) - {{A}_{k}}\nabla F({{y}_{{k + 1}}})(x - {{z}_{k}}) \\ - \;{{A}_{k}}{{\Omega }_{1}}(F,{{y}_{{k + 1}}},{{z}_{k}}) = {{A}_{{k + 1}}}{{\Omega }_{1}}\left( {F,{{y}_{{k + 1}}},x - \frac{{{{A}_{k}}}}{{{{A}_{{k + 1}}}}}(x - {{z}_{k}})} \right) - {{A}_{k}}{{\Omega }_{1}}(F,{{y}_{{k + 1}}},{{z}_{k}}) \geqslant {{A}_{{k + 1}}}F({{y}_{{k + 1}}}) - {{A}_{k}}F({{z}_{k}}) \\ \, + {{A}_{{k + 1}}}(\nabla f({{y}_{{k + 1}}}) + g'({{y}_{{k + 1}}}))\left( {\frac{{{{a}_{{k + 1}}}}}{{{{A}_{{k + 1}}}}}x + \frac{{{{A}_{k}}}}{{{{A}_{{k + 1}}}}}{{z}_{k}} - {{y}_{{k + 1}}}} \right), \\ \end{gathered} $$

which completes the proof.

Lemma 4. Let \({{\lambda }_{{k + 1}}}: = \tfrac{{a_{{k + 1}}^{2}}}{{{{A}_{{k + 1}}}}}\) and \(\mathop {\tilde {x}}\nolimits_k : = \tfrac{{{{a}_{{k + 1}}}}}{{{{A}_{{k + 1}}}}}{{x}_{k}} + \tfrac{{{{A}_{k}}}}{{{{A}_{{k + 1}}}}}{{y}_{k}}\). Then

$$\begin{gathered} {{\psi }_{{k + 1}}}({{x}_{{k + 1}}}) - {{A}_{{k + 1}}}F({{y}_{{k + 1}}}) - ({{\psi }_{k}}({{x}_{k}}) - {{A}_{k}}F({{y}_{k}})) \\ \geqslant \frac{{{{A}_{{k + 1}}}}}{{2{{\lambda }_{{k + 1}}}}}({{\left\| {{{y}_{{k + 1}}} - {{{\tilde {x}}}_{k}}} \right\|}^{2}} - {{\left\| {{{y}_{{k + 1}}} - ({{{\tilde {x}}}_{k}} - {{\lambda }_{{k + 1}}}(\nabla f({{y}_{{k + 1}}})) + g{\kern 1pt} '({{y}_{{k + 1}}}))} \right\|}^{2}}). \\ \end{gathered} $$

Additionally applying inequality (11) yields

$${{\psi }_{k}}({{x}_{k}}) - {{A}_{k}}F({{y}_{k}}) \geqslant \frac{{1 - {{\sigma }^{2}}}}{2}\sum\limits_{i = 1}^k \,\frac{{{{A}_{i}}}}{{{{\lambda }_{i}}}}{{\left\| {{{y}_{i}} - {{{\tilde {x}}}_{{i - 1}}}} \right\|}^{2}}.$$

Proof. Using Lemma 3 with \({{z}_{k}} = {{y}_{k}}\) and \(x = {{x}_{{k + 1}}}\), we obtain (for \(\tilde {x}: = \tfrac{{{{a}_{{k + 1}}}}}{{{{A}_{{k + 1}}}}}x + \tfrac{{{{A}_{k}}}}{{{{A}_{{k + 1}}}}}{{y}_{k}}\))

$$\begin{gathered} (\nabla f({{y}_{{k + 1}}}) + g{\kern 1pt} '({{y}_{{k + 1}}}))\left( {\frac{{{{a}_{{k + 1}}}}}{{{{A}_{{k + 1}}}}}x + \frac{{{{A}_{k}}}}{{{{A}_{{k + 1}}}}}{{y}_{k}} - {{y}_{{k + 1}}}} \right) + \frac{1}{{2{{A}_{{k + 1}}}}}{{\left\| {x - {{x}_{k}}} \right\|}^{2}} = (\nabla f({{y}_{{k + 1}}}) + g{\kern 1pt} '({{y}_{{k + 1}}}))(\tilde {x} - {{y}_{{k + 1}}}) \\ + \;\frac{1}{{2{{A}_{{k + 1}}}}}{{\left\| {\frac{{{{A}_{{k + 1}}}}}{{{{a}_{{k + 1}}}}}\left( {\tilde {x} - \frac{{{{A}_{k}}}}{{{{A}_{{k + 1}}}}}{{y}_{k}}} \right) - {{x}_{k}}} \right\|}^{2}} = (\nabla f({{y}_{{k + 1}}}) + g{\kern 1pt} '({{y}_{{k + 1}}}))(\tilde {x} - {{y}_{{k + 1}}}) + \frac{{{{A}_{{k + 1}}}}}{{2a_{{k + 1}}^{2}}}{{\left\| {\tilde {x} - \left( {\frac{{{{a}_{{k + 1}}}}}{{{{A}_{k}}}}{{x}_{k}} + \frac{{{{A}_{k}}}}{{{{A}_{{k + 1}}}}}{{y}_{k}}} \right)} \right\|}^{2}}, \\ \end{gathered} $$

whence

$$\begin{gathered} {{\psi }_{{k + 1}}}({{x}_{{k + 1}}}) - {{A}_{{k + 1}}}F({{y}_{{k + 1}}}) - ({{\psi }_{k}}({{x}_{k}}) - {{A}_{k}}F({{y}_{k}})) \\ \geqslant {{A}_{{k + 1}}}\mathop {\min}\limits_{x \in {{\mathbb{R}}^{d}}} \left\{ {(\nabla f({{y}_{{k + 1}}}) + g{\kern 1pt} '({{y}_{{k + 1}}}))(x - {{y}_{{k + 1}}}) + \frac{1}{{2{{\lambda }_{{k + 1}}}}}{{{\left\| {x - {{{\tilde {x}}}_{k}}} \right\|}}^{2}}} \right\}. \\ \end{gathered} $$

The value of the minimum is easy to compute.

The first inequality in Theorem 3 is proved by combining Lemmas 4 and 2 with Lemma 2.5 from [14]. The second inequality in Theorem 3 follows from Lemmas 4 and 1.

The following lemma proves that the minimization of the Taylor series of order \(p\) for (4) can be represented as an implicit gradient step for a large stepsize.

Lemma 5. Inequality (11) holds at \(\sigma = 1{\text{/}}2\) for (4), which implies that

$$\frac{1}{2} \leqslant {{\lambda }_{{k + 1}}}\frac{{{{L}_{p}}{{{\left\| {{{y}_{{k + 1}}} - {{{\tilde {x}}}_{k}}} \right\|}}^{{p - 1}}}}}{{(p - 1)!}} \leqslant \frac{p}{{p + 1}}.$$
(13)

Proof. The optimality condition yields

$${{\nabla }_{y}}{{f}_{p}}({{y}_{{k + 1}}},{{\tilde {x}}_{k}}) + \frac{{{{L}_{p}}(p + 1)}}{{p!}}({{y}_{{k + 1}}} - {{\tilde {x}}_{k}}){{\left\| {{{y}_{{k + 1}}} - {{{\tilde {x}}}_{k}}} \right\|}^{{p - 1}}} + g{\kern 1pt} '({{y}_{{k + 1}}}) = 0.$$
(14)

It follows that

$$\begin{gathered} {{y}_{{k + 1}}} - ({{{\tilde {x}}}_{k}} - {{\lambda }_{{k + 1}}}(\nabla f({{y}_{{k + 1}}}) + g{\kern 1pt} '({{y}_{{k + 1}}}))) = {{\lambda }_{{k + 1}}}(\nabla f({{y}_{{k + 1}}}) + g{\kern 1pt} '({{y}_{{k + 1}}})) \\ \, - \frac{{p!}}{{{{L}_{p}}(p + 1){{{\left\| {{{y}_{{k + 1}}} - {{{\tilde {x}}}_{k}}} \right\|}}^{{p - 1}}}}}({{\nabla }_{y}}{{f}_{p}}({{y}_{{k + 1}}},{{{\tilde {x}}}_{k}}) + g{\kern 1pt} '({{y}_{{k + 1}}})). \\ \end{gathered} $$

Using the Taylor series, we find that the gradient of the function satisfies

$$\left\| {\nabla f(y) - {{\nabla }_{y}}{{f}_{p}}(y,x)} \right\| \leqslant \frac{{{{L}_{p}}}}{{p!}}{{\left\| {y - x} \right\|}^{p}};$$

thus,

$$\begin{gathered} \left\| {{{y}_{{k + 1}}} - ({{{\tilde {x}}}_{k}} - {{\lambda }_{{k + 1}}}(\nabla f({{y}_{{k + 1}}}) + g{\kern 1pt} '({{y}_{{k + 1}}})))} \right\| \leqslant {{\lambda }_{{k + 1}}}\frac{{{{L}_{p}}}}{{p!}}{{\left\| {{{y}_{{k + 1}}} - {{{\tilde {x}}}_{k}}} \right\|}^{p}} \\ + \;\left| {{{\lambda }_{{k + 1}}} - \frac{{p!}}{{{{L}_{p}}(p + 1){{{\left\| {{{y}_{{k + 1}}} - {{{\tilde {x}}}_{k}}} \right\|}}^{{p - 1}}}}}} \right|\left\| {{{\nabla }_{y}}{{f}_{p}}({{y}_{{k + 1}}},{{{\tilde {x}}}_{k}}) + g{\kern 1pt} '({{y}_{{k + 1}}})} \right\| \\ \leqslant \left\| {{{y}_{{k + 1}}} - {{{\tilde {x}}}_{k}}} \right\|\left( {{{\lambda }_{{k + 1}}}\frac{{{{L}_{p}}}}{{p!}}{{{\left\| {{{y}_{{k + 1}}} - {{{\tilde {x}}}_{k}}} \right\|}}^{{p - 1}}} + \left| {{{\lambda }_{{k + 1}}}\frac{{{{L}_{p}}(p + 1){{{\left\| {{{y}_{{k + 1}}} - {{{\tilde {x}}}_{k}}} \right\|}}^{{p - 1}}}}}{{p!}} - 1{\kern 1pt} } \right|} \right) \\ \, = \left\| {{{y}_{{k + 1}}} - {{{\tilde {x}}}_{k}}} \right\|\left( {\frac{\eta }{p} + \left| {\eta \frac{{p + 1}}{p} - 1} \right|} \right), \\ \end{gathered} $$

where we used (14) in the second inequality and set \(\eta : = {{\lambda }_{{k + 1}}}\tfrac{{{{L}_{p}}{{{\left\| {{{y}_{{k + 1}}} - {{{\tilde {x}}}_{k}}} \right\|}}^{{p - 1}}}}}{{(p - 1)!}}\) in the last equality. The final result is obtained by assuming that \(1{\text{/}}2 \leqslant \eta \leqslant p{\text{/}}(p + 1)\) in (13).

Finally, replacing \(\left\| {x\text{*}} \right\|\) by \(\left\| {{{x}_{0}} - x\text{*}} \right\|\) in Lemma 3.3 and applying Lemma 3.4 from [14], we complete the proof of Theorem 1.

APPENDIX 2

Proof of Theorem 2. Since \(F\) is an \(r\)-uniformly convex function, we obtain

$${{R}_{{k + 1}}} = \left\| {{{z}_{{k + 1}}} - {{x}_{{{*}}}}} \right\| \leqslant {{\left( {\frac{{r(F({{z}_{{k + 1}}}) - F({{x}_{{{*}}}}))}}{{{{\sigma }_{r}}}}} \right)}^{{1/r}}}\;\mathop \leqslant \limits^{(5)} \;{{\left( {\frac{{r\left( {\frac{{{{c}_{p}}{{L}_{p}}R_{k}^{{p + 1}}}}{{N_{k}^{{\tfrac{{3p + 1}}{2}}}}}} \right)}}{{{{\sigma }_{r}}}}} \right)}^{{1/r}}} = {{\left( {\frac{{r{{c}_{p}}{{L}_{p}}R_{k}^{{p + 1}}}}{{{{\sigma }_{r}}N_{k}^{{\tfrac{{3p + 1}}{2}}}}}} \right)}^{{1/r}}}\;\mathop \leqslant \limits^{(9)} \;{{\left( {\frac{{R_{k}^{{p + 1}}}}{{{{2}^{r}}R_{k}^{{p + 1 - r}}}}} \right)}^{{1/r}}} = \frac{{{{R}_{k}}}}{2}.$$

Now the total number of steps in method 1 can be estimated as follows:

$$\begin{gathered} \sum\limits_{k = 0}^K \,{{N}_{k}} \leqslant \sum\limits_{k = 0}^K \,{{\left( {\frac{{r{{c}_{p}}{{L}_{p}}{{2}^{r}}}}{{{{\sigma }_{r}}}}R_{k}^{{p + 1 - r}}} \right)}^{{\tfrac{2}{{3p + 1}}}}} + K = \sum\limits_{k = 0}^K \,{{\left( {\frac{{r{{c}_{p}}{{L}_{p}}{{2}^{r}}}}{{{{\sigma }_{r}}}}{{{({{R}_{0}}{{2}^{{ - k}}})}}^{{p + 1 - r}}}} \right)}^{{\tfrac{2}{{3p + 1}}}}} + K \\ \, = {{\left( {\frac{{r{{c}_{p}}{{L}_{p}}{{2}^{r}}R_{0}^{{p + 1 - r}}}}{{{{\sigma }_{r}}}}} \right)}^{{\tfrac{2}{{3p + 1}}}}}\sum\limits_{k = 0}^K \,{{2}^{{\tfrac{{ - 2(p + 1 - r)k}}{{3p + 1}}}}} + K. \\ \end{gathered} $$

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Gasnikov, A.V., Dvinskikh, D.M., Dvurechensky, P.E. et al. Accelerated Meta-Algorithm for Convex Optimization Problems. Comput. Math. and Math. Phys. 61, 17–28 (2021). https://doi.org/10.1134/S096554252101005X

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1134/S096554252101005X

Keywords:

Navigation