Skip to main content
Log in

A unified convergence rate analysis of the accelerated smoothed gap reduction algorithm

  • Original Paper
  • Published:
Optimization Letters Aims and scope Submit manuscript

Abstract

In this paper, we develop a unified convergence analysis framework for the Accelerated Smoothed GAp ReDuction algorithm (ASGARD) introduced in Tran-Dinh et al. (SIAM J Optim 28(1):96–134, 2018). Unlike Tran-Dinh et al. (SIAM J Optim 28(1):96–134, 2018), the new analysis covers three settings in a single algorithm: general convexity, strong convexity, and strong convexity and smoothness. Moreover, we establish the convergence guarantees on three criteria: (i) gap function, (ii) primal objective residual, and (iii) dual objective residual. Our convergence rates are optimal (up to a constant factor) in all cases. While the convergence rate on the primal objective residual for the general convex case has been established in Tran-Dinh et al. (SIAM J Optim 28(1):96–134, 2018), we prove additional convergence rates on the gap function and the dual objective residual. The analysis for the last two cases is completely new. Our results provide a complete picture on the convergence guarantees of ASGARD. Finally, we present four different numerical experiments on a representative optimization model to verify our algorithm and compare it with the well-known Nesterov’s smoothing algorithm.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2

Similar content being viewed by others

References

  1. Bauschke, H.H., Combettes, P.: Convex Analysis and Monotone Operators Theory in Hilbert Spaces, 2nd edn. Springer, Berlin (2017)

    Book  Google Scholar 

  2. Beck, A., Teboulle, M.: A fast iterative shrinkage-thresholding algorithm for linear inverse problems. SIAM J. Imaging Sci. 2(1), 183–202 (2009)

    Article  MathSciNet  Google Scholar 

  3. Belloni, A., Chernozhukov, V., Wang, L.: Square-root LASSO: Pivotal recovery of sparse signals via conic programming. Biometrika 94(4), 791–806 (2011)

    Article  MathSciNet  Google Scholar 

  4. Boţ, R.I., Böhm, A.: Variable smoothing for convex optimization problems using stochastic gradients. J. Sci. Comput. 85(2), 1–29 (2020)

  5. Chambolle, A., Pock, T.: A first-order primal–dual algorithm for convex problems with applications to imaging. J. Math. Imaging Vis. 40(1), 120–145 (2011)

    Article  MathSciNet  Google Scholar 

  6. Chambolle, A., Pock, T.: On the ergodic convergence rates of a first-order primal–dual algorithm. Math. Program. 159(1–2), 253–287 (2016)

    Article  MathSciNet  Google Scholar 

  7. Chen, Y., Lan, G., Ouyang, Y.: Optimal primal–dual methods for a class of saddle-point problems. SIAM J. Optim. 24(4), 1779–1814 (2014)

    Article  MathSciNet  Google Scholar 

  8. Condat, L.: A primal–dual splitting method for convex optimization involving Lipschitzian, proximable and linear composite terms. J. Optim. Theory Appl. 158, 460–479 (2013)

    Article  MathSciNet  Google Scholar 

  9. Davis, D.: Convergence rate analysis of primal–dual splitting schemes. SIAM J. Optim. 25(3), 1912–1943 (2015)

    Article  MathSciNet  Google Scholar 

  10. Davis, D., Yin, W.: A three-operator splitting scheme and its optimization applications. Set-Valued Var. Anal. 25(4), 829–858 (2017)

    Article  MathSciNet  Google Scholar 

  11. Esser, E., Zhang, X., Chan, T.: A general framework for a class of first order primal–dual algorithms for TV-minimization. SIAM J. Imaging Sci. 3(4), 1015–1046 (2010)

    Article  MathSciNet  Google Scholar 

  12. Goldstein, T., Esser, E., Baraniuk, R.: Adaptive primal-dual hybrid gradient methods for saddle point problems. Technical Report, pp. 1–26 (2013). arxiv: 1305.0546v1.pdf

  13. Grant, M.: Disciplined Convex Programming. Ph.D. thesis, Stanford University (2004)

  14. He, B.S., Yuan, X.M.: On the \({O}(1/n)\) convergence rate of the Douglas–Rachford alternating direction method. SIAM J. Numer. Anal. 50, 700–709 (2012)

    Article  MathSciNet  Google Scholar 

  15. Nemirovskii, A., Yudin, D.: Problem Complexity and Method Efficiency in Optimization. Wiley Interscience, London (1983)

    Google Scholar 

  16. Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course, Volume 87 of Applied Optimization. Kluwer Academic Publishers, London (2004)

    Book  Google Scholar 

  17. Nesterov, Y.: Smooth minimization of non-smooth functions. Math. Program. 103(1), 127–152 (2005)

    Article  MathSciNet  Google Scholar 

  18. Nesterov, Y.: Gradient methods for minimizing composite objective function. Math. Program. 140(1), 125–161 (2013)

    Article  MathSciNet  Google Scholar 

  19. O’Connor, D., Vandenberghe, L.: Primal-dual decomposition by operator splitting and applications to image deblurring. SIAM J. Imaging Sci. 7(3), 1724–1754 (2014)

  20. Ouyang, Y., Xu, Y.: Lower complexity bounds of first-order methods for convex-concave bilinear saddle-point problems. Math. Program. 185, 1–35 (2019)

    Article  MathSciNet  Google Scholar 

  21. Sabach, S., Teboulle, M.: Faster Lagrangian-based methods in convex optimization (2020). arXiv preprint arXiv:2010.14314

  22. Tran-Dinh, Q., Alacaoglu, A., Fercoq, O., Cevher, V.: An adaptive primal–dual framework for nonsmooth convex minimization. Math. Program. Compt. 12, 451–491 (2020)

    Article  MathSciNet  Google Scholar 

  23. Tran-Dinh, Q., Fercoq, O., Cevher, V.: A smooth primal–dual optimization framework for nonsmooth composite convex minimization. SIAM J. Optim. 28(1), 96–134 (2018)

    Article  MathSciNet  Google Scholar 

  24. Tran-Dinh, Q., Savorgnan, C., Diehl, M.: Combining Lagrangian decomposition and excessive gap smoothing technique for solving large-scale separable convex optimization problems. Comput. Optim. Appl. 55(1), 75–111 (2013)

    Article  MathSciNet  Google Scholar 

  25. Tran-Dinh, Q., Zhu, Y.: Non-stationary first-order primal–dual algorithms with faster convergence rates. SIAM J. Optim. 30(4), 2866–2896 (2020)

    Article  MathSciNet  Google Scholar 

  26. Tseng, P.: On accelerated proximal gradient methods for convex–concave optimization. SIAM J. Optim. (2008)

  27. Valkonen, T.: Inertial, corrected, primal–dual proximal splitting. SIAM J. Optim. 30(2), 1391–1420 (2020)

    Article  MathSciNet  Google Scholar 

  28. Vu, B.C.: A variable metric extension of the forward–backward–forward algorithm for monotone operators. Numer. Funct. Anal. Optim. 34(9), 1050–1065 (2013)

    Article  MathSciNet  Google Scholar 

  29. Zhu, Y., Liu, D., Tran-Dinh, Q.: Primal–dual algorithms for a class of nonlinear compositional convex optimization problems, pp. 1–26 (2020). arXiv preprint arXiv:2006.09263

  30. Zou, H., Hastie, T.: Regularization and variable selection via the elastic net. J. Roy. Stat. Soc. B 67(2), 301–320 (2005)

    Article  MathSciNet  Google Scholar 

Download references

Acknowledgements

This work is partly supported by the Office of Naval Research under Grant No. ONR-N00014-20-1-2088 (2020–2023), and the Nafosted Vietnam, Grant No. 101.01-2020.06 (2020–2022).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Quoc Tran-Dinh.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

Appendix 1: Technical lemmas

We need the following technical lemmas for our convergence analysis in the main text.

Lemma 4

([23, Lemma 10]) Given \(\beta > 0\), \(\dot{y}\in \mathbb {R}^n\), and a proper, closed, and convex function \(g : \mathbb {R}^n \rightarrow \mathbb {R}\cup \{+\infty \}\) with its Fenchel conjugate \(g^{*}\), we define

$$\begin{aligned} g_{\beta }(u, \dot{y}) := \max _{y\in \mathbb {R}^n}\left\{ \langle u, y\rangle - g^{*}(y) - \tfrac{\beta }{2}\Vert y - \dot{y}\Vert ^2\right\} . \end{aligned}$$
(29)

Let \(y^{*}_{\beta }(u, \dot{y})\) be the unique solution of (29). Then, the following statements hold:

(a):

\(g_{\beta }(\cdot ,\dot{y})\) is convex w.r.t. u on \(\mathrm {dom}\left( g\right) \) and \(\frac{1}{\beta + \mu _{g^{*}}}\)-smooth w.r.t. u on \(\mathrm {dom}\left( g\right) \), where \(\nabla _u{g_{\beta }}(u,\dot{y}) = \text {prox}_{g^{*}/\beta }(\dot{y} + \frac{1}{\beta }u)\). Moreover, for any \(u, \hat{u} \in \mathrm {dom}\left( g\right) \), we have

$$\begin{aligned} g_{\beta }(\hat{u},\dot{y}) + \langle \nabla {g}_{\beta }(\hat{u},\dot{y}), u - \hat{u}\rangle \le g_{\beta }(u,\dot{y}) - \frac{\beta + \mu _{g^{*}}}{2}\Vert \nabla _u{g_{\beta }}(\hat{u},\dot{y}) - \nabla _u{g_{\beta }}(u,\dot{y})\Vert ^2. \end{aligned}$$
(30)
(b):

For any \(\beta > 0\), \(\dot{y}\in \mathbb {R}^n\), and \(u\in \mathrm {dom}\left( g\right) \), we have

$$\begin{aligned} \begin{array}{lcl} g_{\beta }(u,\dot{y}) \le g(u) \le g_{\beta }(u,\dot{y}) + \frac{\beta }{2} [ D_{g}(\dot{y})]^2, \ \text {where} \ D_{g}(\dot{y}) := \sup _{y\in \partial {g(u)}} \left\| y - \dot{y}\right\| . \end{array} \end{aligned}$$
(31)
(c):

For \(u\in \mathrm {dom}\left( g\right) \) and \(\dot{y}\in \mathbb {R}^n\), \(g_{\beta }(u,\dot{y})\) is convex in \(\beta \), and for all \(\hat{\beta } \ge \beta > 0\), we have

$$\begin{aligned} g_{\beta }(u,\dot{y}) \le g_{\hat{\beta }}(u,\dot{y}) + \big (\tfrac{\hat{\beta } - \beta }{2}\big )\Vert \nabla _u{g_{\beta }}(u,\dot{y}) - \dot{y} \Vert ^2. \end{aligned}$$
(32)
(d):

For any \(\beta > 0\), and \(u, \hat{u}\in \mathrm {dom}\left( g\right) \), we have

$$\begin{aligned} \begin{array}{lcl} g_{\beta }(u,\dot{y}) + \langle \nabla _u{g_{\beta }}(u, \dot{y}), \hat{u} - u\rangle\le & {} \ell _{\beta }(\hat{u}, \dot{y}) - \frac{\beta }{2}\Vert \nabla _u{g_{\beta }}(u,\dot{y}) - \dot{y}\Vert ^2, \end{array} \end{aligned}$$
(33)

where \(\ell _{\beta }(\hat{u}, \dot{y}) := \langle \hat{u}, \nabla _u{g_{\beta }}(u,\dot{y}) \rangle - g^{*}(\nabla _u{g_{\beta }}(u,\dot{y})) \le g(\hat{u}) - \frac{\mu _{g^{*}}}{2}\Vert \nabla _u{g_{\beta }}(u,\dot{y}) - \nabla {g}(\hat{u})\Vert ^2\) for any \(\nabla {g}(\hat{u}) \in \partial {g}(\hat{u})\).

Lemma 5

The following statements hold.

\(\mathrm {(a)}\):

Let \(\left\{ \tau _k\right\} \subset (0, 1]\) be computed by \(\tau _{k+1} := \frac{\tau _k}{2}\big [ (\tau _k^2 + 4)^{1/2} - \tau _k\big ]\) for some \(\tau _0 \in (0, 1]\). Then, we have

$$\begin{aligned}&\tau _k^2 = (1-\tau _k)\tau _{k-1}^2, \quad \frac{1}{k + 1/\tau _0} \le \tau _k < \frac{2}{k + 2/\tau _0}, \\&\quad \text {and}\quad \frac{1}{1 + \tau _{k-2}} \le 1 - \tau _k \le \frac{1}{1+\tau _{k-1}}. \end{aligned}$$

Moreover, we also have

$$\begin{aligned} \begin{array}{ll} &{} \varTheta _{l,k} := \displaystyle \prod _{i=l}^k(1-\tau _i) = \dfrac{\tau _k^2}{\tau _{l-1}^2} \quad \text {for}\ 0\le l\le k, \qquad \\ &{}\varTheta _{0,k} = \dfrac{(1-\tau _0)\tau _k^2}{\tau _0^2} \le \dfrac{4(1-\tau _0)}{(\tau _0k+2)^2}, \vspace{1ex}\\ \text {and}\quad &{}\dfrac{\tau _{l+1}^2}{\tau _{k+2}^2} \le \varGamma _{l,k} := \displaystyle \prod _{i=l}^k(1+\tau _i) \le \dfrac{\tau _l^2}{\tau _{k+1}^2} \quad \text {for} \ 0 \le l \le k. \end{array} \end{aligned}$$

If we update \(\beta _k := \frac{\beta _{k-1}}{1+\tau _k}\) for a given \(\beta _0 > 0\), then

$$\begin{aligned} \frac{4\beta _0\tau _0^2}{\tau _1^2[\tau _0(k+1) + 2]^2} \le \frac{\beta _0\tau _{k+1}^2}{\tau _1^2} \le \beta _k = \frac{\beta _0}{\varGamma _{1,k}}\le \frac{\beta _0\tau _{k+2}^2}{\tau _2^2} \le \frac{4\beta _0\tau _0^2}{\tau _2^2[\tau _0(k+2) + 2]^2}. \end{aligned}$$
\(\mathrm {(b)}\):

Let \(\left\{ \tau _k\right\} \subset (0, 1]\) be computed by solving \(\tau _k^3 + \tau _k^2 + \tau _{k-1}^2\tau _k - \tau _{k-1}^2 = 0\) for all \(k\ge 1\) and \(\tau _0 := 1\). Then, we have \(\frac{1}{k+1} \le \tau _k \le \frac{2}{k+2}\) and \(\varTheta _{1,k} := \prod _{i=1}^k(1-\tau _i) \le \frac{1}{k+1}\). Moreover, if we update \(\beta _k := \frac{\beta _{k-1}}{1+\tau _k}\), then \(\beta _k \le \frac{2\beta _0}{k+2}\).

Proof

The first two relations of (a) have been proved, e.g., in [24]. Let us prove the last inequality of (a). Since \(\frac{1}{1+\tau _{k-2}} \le 1-\tau _k\) is equivalent to \(\tau _{k-2}(1-\tau _k) \ge \tau _k\). Using \(1- \tau _k = \frac{\tau _k^2}{\tau _{k-1}^2}\), we have \(\tau _k\tau _{k-2} \ge \tau _{k-1}^2\). Utilizing \(\tau _k = \frac{\tau _{k-1}}{2}\big [(\tau _{k-1}^2 + 4)^2 - \tau _{k-1}\big ]\), this condition is equivalent to \(\tau _{k-2}^2 \ge \tau _{k-1}^2(1 + \tau _{k-2})\). However, since \(\tau _{k-1}^2 = (1-\tau _{k-1})\tau _{k-2}^2\), the last condition becomes \(1 \ge (1-\tau _{k-1})(1+\tau _{k-2})\), or equivalently, \(\tau _{k-1} \le \tau _{k-2}\), which automatically holds.

To prove \(1-\tau _k \le \frac{1}{1 + \tau _{k-1}}\), we write it as \(\tau _{k-1}(1-\tau _k) \le \tau _{k}\). Using again \(\tau _k^2 = (1-\tau _k)\tau _{k-1}^2\), the last inequality is equivalent to \(\tau _k \le \tau _{k-1}\), which automatically holds. The last statements of (a) is a consequence of \(1-\tau _k = \frac{\tau _k^2}{\tau _{k-1}^2}\) and the previous relations.

(b) We consider the function \(\varphi (\tau ) := \tau ^3 + \tau ^2 + \tau _{k-1}^2\tau - \tau _{k-1}^2\). Clearly, \(\varphi (0) = -\tau _{k-1}^2 < 0\) and \(\varphi (1) = 2 > 0\). Moreover, \(\varphi '(\tau ) = 3\tau ^2 + 2\tau + \tau _{k-1}^2 > 0\) for \(\tau \in [0, 1]\). Hence, the cubic equation \(\varphi (\tau ) = 0\) has a unique solution \(\tau _k \in (0, 1)\). Therefore, \(\{\tau _k\}_{k\ge 0}\) is well-defined.

Next, since \(\tau _k^3 + \tau _k^2 + \tau _k\tau _{k-1}^2 - \tau _{k-1}^2 = 0\) is equivalent to \(\tau _{k-1}^2(1-\tau _k) = \tau _k^2(1+\tau _k)\), we have \(\tau _{k-1}^2(1-\tau _k) = \tau _k^2(1+\tau _k) \le \frac{\tau _k^2}{1-\tau _k}\). This inequality becomes \(\tau _k \ge \frac{\tau _{k-1}}{1 + \tau _{k-1}}\). By induction and \(\tau _0 = 1\), we can easily show that \(\tau _k \ge \frac{1}{k+1}\). On the other hand, \(\tau _{k-1}^2(1-\tau _k) = \tau _k^2(1+\tau _k) \ge \tau _k^2\). From this inequality, with a similar argument as in the proof of the statement (a), we can also easily show that \(\tau _k \le \frac{2}{k+2}\). Hence, we have \(\frac{1}{k+1} \le \tau _k \le \frac{2}{k+2}\) for all \(k\ge 0\).

Finally, since \(\tau _k \ge \frac{1}{k+1}\), we have \(\prod _{i=1}^k(1-\tau _i) \le \prod _{i=1}^k\left( 1 - \frac{1}{i+1}\right) = \frac{1}{k+1}\). Alternatively, \(\prod _{i=1}^k(1+\tau _i) \ge \prod _{i=1}^k\left( 1 + \frac{1}{i+1}\right) = \frac{k+2}{2}\). However, since \(\beta _k = \frac{\beta _{k-1}}{1+\tau _k}\), we have \(\beta _k = \beta _0\prod _{i=1}^k\frac{1}{1+\tau _i} \le \frac{2\beta _0}{k+2}\). \(\square \)

Lemma 6

([29, Lemma 4] and [23]) The following statements hold.

\(\mathrm {(a)}\):

For any \(u, v, w\in \mathbb {R}^p\) and \(t_1, t_2 \in \mathbb {R}\) such that \(t_1 + t_2 \ne 0\), we have

$$\begin{aligned} t_1\Vert u - w\Vert ^2 + t_2\Vert v - w\Vert ^2 = (t_1 + t_2)\Vert w - \tfrac{1}{t_1+t_2}(t_1u + t_2v)\Vert ^2 + \tfrac{t_1t_2}{t_1+t_2}\Vert u-v\Vert ^2. \end{aligned}$$
\(\mathrm {(b)}\):

For any \(\tau \in (0, 1)\), \(\hat{\beta }, \beta > 0\), \(w, z\in \mathbb {R}^p\), we have

$$\begin{aligned} \begin{array}{lcl} &{}&{}\beta (1-\tau ) \Vert w - z\Vert ^2 + \beta \tau \Vert w\Vert ^2 - (1-\tau )(\hat{\beta } - \beta )\Vert z\Vert ^2 = \beta \Vert w - (1-\tau )z\Vert ^2 \vspace{1ex}\\ &{}&{} \quad + {~} (1-\tau )\big [\tau \beta - (\hat{\beta } - \beta ) \big ]\Vert z\Vert ^2. \end{array} \end{aligned}$$

The following lemma is a key step to address the strongly convex case of f in (1).

Lemma 7

Given \(L_k > 0\), \(\mu _f > 0\), and \(\tau _k \in (0, 1)\), let \(m_k := \frac{L_k + \mu _f}{L_{k-1} + \mu _f}\) and \(a_k := \frac{L_k}{L_{k-1} + \mu _f}\). Assume that the following two conditions hold:

$$\begin{aligned} \left\{ \begin{array}{llcl} &{}(1-\tau _k) \big [ \tau _{k-1}^2 + m_k\tau _k \big ]&{}\ge &{} a_k\tau _k \vspace{1ex}\\ &{}m_k\tau _k\tau _{k-1}^2 + m_k^2\tau _k^2 &{}\ge &{} a_k\tau _{k-1}^2. \end{array}\right. \end{aligned}$$
(34)

Let \(\left\{ x^k\right\} \) be a given sequence in \(\mathbb {R}^p\). We define \(\hat{x}^k := x^k + \frac{1}{\omega _k}(x^k - x^{k-1})\), where \(\omega _k\) is chosen such that

$$\begin{aligned} \max \left\{ \frac{\tau _{k-1} + \sqrt{\tau _{k-1}^2 + 4a_k}}{2(1-\tau _{k-1})}, \frac{a_k\tau _k}{(1-\tau _k)(1-\tau _{k-1})\tau _{k-1}}\right\} \le \omega _k \le \frac{\tau _{k-1}^2 + m_k\tau _k}{\tau _{k-1}(1-\tau _{k-1})}. \end{aligned}$$
(35)

Then, \(\omega _k\) is well-defined, and for any \(x \in \mathbb {R}^p\), we have

$$\begin{aligned} \begin{array}{ll} &{}L_k\tau _k^2\Vert \frac{1}{\tau _k}[\hat{x}^k - (1-\tau _k)x^k] - x\Vert ^2 - \mu _f \tau _k(1-\tau _k)\Vert x^k - x\Vert ^2 \vspace{1ex}\\ &{}\qquad \le {~} (1-\tau _k) \left( L_{k-1} + \mu _f\right) \tau _{k-1}^2\Vert \frac{1}{\tau _{k-1}}[x^k - (1-\tau _{k-1})x^{k-1}] - x\Vert ^2. \end{array} \end{aligned}$$
(36)

Proof

Firstly, from the definition \(\hat{x}^k := x^k + \frac{1}{\omega _k}(x^k - x^{k-1})\) of \(\hat{x}^k\), we have \(\omega _k(\hat{x}^k - x^k) = x^k - x^{k-1}\). Hence, we can show that

$$\begin{aligned} \begin{array}{lcl} \tau _{k-1}^2\Vert \frac{1}{\tau _{k-1}}[x^k - (1-\tau _{k-1})x^{k-1}] - x\Vert ^2 &{}= &{} \Vert (1-\tau _{k-1})(x^k - x^{k-1}) + \tau _{k-1}(x^k - x)\Vert ^2 \vspace{1ex}\\ &{}= &{} \Vert (1-\tau _{k-1})\omega _k(\hat{x}^k - x^{k}) + \tau _{k-1}(x^k - x)\Vert ^2 \vspace{1ex}\\ &{}= &{} \omega _k^2(1-\tau _{k-1})^2\Vert \hat{x}^k - x^k\Vert ^2 + \tau _{k-1}^2\Vert x^k - x\Vert ^2\vspace{1ex}\\ &{}&{} + {~} 2\omega _k(1-\tau _{k-1})\tau _{k-1}\langle \hat{x}^k - x^k, x^k - x\rangle . \end{array} \end{aligned}$$

Alternatively, we also have

$$\begin{aligned} \begin{array}{lcl} \tau _k^2\Vert \frac{1}{\tau _k}[\hat{x}^k - (1-\tau _k)x^k] - x\Vert ^2= & {} \Vert \hat{x}^k - x^k\Vert ^2 + \tau _k^2\Vert x^k - x\Vert ^2 + 2\tau _k\langle \hat{x}^k - x^k, x^k - x\rangle . \end{array} \end{aligned}$$

Utilizing the two last expressions, (36) can be rewritten equivalently to

$$\begin{aligned} \begin{array}{lcl} \mathcal {T}_{[1]} &{}:= &{} 2\left[ \left( L_{k-1} + \mu _f\right) (1-\tau _k)(1-\tau _{k-1})\tau _{k-1}\omega _k - L_k\tau _k \right] \langle \hat{x}^k - x^k, x - x^k\rangle \vspace{1ex}\\ &{}\le &{} \left[ \left( L_{k-1} + \mu _f \right) (1-\tau _k)(1-\tau _{k-1})^2\omega _k^2 - L_{k}\right] \Vert \hat{x}^k - x^k\Vert ^2 \vspace{1ex}\\ &{}&{} + {~} \left[ \left( L_{k-1} + \mu _f\right) (1-\tau _k)\tau _{k-1}^2 - L_k\tau _k^2 + \mu _f\tau _k(1-\tau _k)\right] \Vert x^k - x\Vert ^2. \end{array} \end{aligned}$$

Now, let us denote

$$\begin{aligned} \left\{ \begin{array}{lcl} c_1 &{}:= &{} \left( L_{k-1} + \mu _f\right) (1-\tau _k)(1-\tau _{k-1})\tau _{k-1}\omega _k - L_k\tau _k\vspace{1ex}\\ c_2 &{}:= &{} \left( L_{k-1} + \mu _f\right) (1-\tau _k)(1-\tau _{k-1})^2\omega _k^2 - L_k \vspace{1ex}\\ c_3 &{}:= &{} \left( L_{k-1} + \mu _f\right) (1-\tau _k)\tau _{k-1}^2 - L_k\tau _k^2 + \mu _f(1-\tau _k)\tau _k. \end{array}\right. \end{aligned}$$

Then, (36) is equivalent to

$$\begin{aligned} 2c_1\langle \hat{x}^k - x^k, x - x^k\rangle \le c_2\Vert \hat{x}^k - x^k\Vert ^2 + c_3\Vert x - x^k\Vert ^2. \end{aligned}$$
(37)

Secondly, we need to guarantee that \(c_1 \ge 0\). This condition holds if we choose \(\omega _k\) such that

$$\begin{aligned} \omega _k \ge \frac{a_k\tau _k}{(1-\tau _k)(1-\tau _{k-1})\tau _{k-1}}. \end{aligned}$$
(38)

Thirdly, we also need to guarantee \(c_2 \ge c_1\), which is equivalent to

$$\begin{aligned} c_2 - c_1 = \left( L_{k-1} + \mu _f\right) (1-\tau _k)(1-\tau _{k-1})\left[ (1-\tau _{k-1})\omega _k^2 - \tau _{k-1}\omega _k \right] - L_k(1-\tau _k) \ge 0. \end{aligned}$$

This condition holds if

$$\begin{aligned} \omega _k \ge \frac{\tau _{k-1} + \sqrt{\tau _{k-1}^2 + 4a_k}}{2(1-\tau _{k-1})}. \end{aligned}$$
(39)

Alternatively, we also need to guarantee \(c_3 \ge c_1\), which is equivalent to

$$\begin{aligned} c_3 - c_1 = \left( L_{k-1} + \mu _f\right) (1-\tau _k)\left[ \tau _{k-1}^2 - (1-\tau _{k-1})\tau _{k-1}\omega _k \right] + (L_k + \mu _f)\tau _k(1-\tau _k) \ge 0. \end{aligned}$$

This condition holds if

$$\begin{aligned} \omega _k \le \frac{\tau _{k-1}^2 + m_k\tau _k}{\tau _{k-1}(1-\tau _{k-1})}. \end{aligned}$$
(40)

Combining (38), (39), and (40), we obtain

$$\begin{aligned} \max \left\{ \frac{\tau _{k-1} + \sqrt{\tau _{k-1}^2 + 4a_k}}{2(1-\tau _{k-1})}, \frac{a_k\tau _k}{(1-\tau _k)(1-\tau _{k-1})\tau _{k-1}}\right\} \le \omega _k \le \frac{\tau _{k-1}^2 + m_k\tau _k}{\tau _{k-1}(1-\tau _{k-1})}, \end{aligned}$$

which is exactly (35). Here, under the condition (34), the left-hand side of the last expression is less than or equal to the right-hand side. Therefore, \(\omega _k\) is well-defined.

Finally, under the choice of \(\omega _k\) as in (35), we have \(c_2 \ge c_1 \ge 0\) and \(c_3\ge c_1 \ge 0\). Hence, (37) holds, which is also equivalent to (36). \(\square \)

Appendix 2: Technical proof of Lemmas 2 and 3 in Sect. 3

This section provides the full proof of Lemmas 2 and 3 in the main text.

1.1 The proof of Lemma 2: key estimate of the primal–dual step (9)

Proof

From the first line of (9) and Lemma 4(a), we have \(\nabla _ug_{\beta _k}(K\hat{x}^k, \dot{y}) = K^{\top }y^{k+1}\). Now, from the second line of (9), we also have

$$\begin{aligned} 0 \in \partial {f}(x^{k+1}) + L_k(x^{k+1} - \hat{x}^k) + K^{\top }\nabla _u{g_{\beta _k}}(K\hat{x}^k, \dot{y}). \end{aligned}$$

Combining this inclusion and the \(\mu _f\)-convexity of f, for any \(x\in \mathrm {dom}\left( f\right) \), we get

$$\begin{aligned} \begin{array}{lcl} f(x^{k+1}) &{} \le &{} f(x) + \langle \nabla _u{g_{\beta _k}}(K\hat{x}^k, \dot{y}), K(x - x^{k+1})\rangle + L_k\langle x^{k+1} - \hat{x}^k, x - x^{k+1}\rangle \vspace{1ex}\\ &{}&{} - {~} \frac{\mu _f}{2}\Vert x^{k+1} - x\Vert ^2. \end{array} \end{aligned}$$

Since \(g_{\beta }(\cdot , \dot{y})\) is \(\frac{1}{\beta + \mu _{g^{*}}}\)-smooth by Lemma 4(a), for any \(x\in \mathrm {dom}\left( f\right) \), we have

$$\begin{aligned} \begin{array}{lcl} g_{\beta _k}(Kx^{k+1}, \dot{y}) &{} \le &{} g_{\beta _k}(K\hat{x}^k, \dot{y}) + \langle \nabla _u{g}_{\beta _k}(K\hat{x}^k, \dot{y}), K(x^{k+1} - \hat{x}^k)\rangle \vspace{1ex}\\ &{}&{} + {~} \frac{1}{2(\beta _k + \mu _{g^{*}})}\Vert K(x^{k+1} - \hat{x}^k)\Vert ^2 \vspace{1ex}\\ &{} = &{} g_{\beta _k}(K\hat{x}^k, \dot{y}) + \langle \nabla _ug_{\beta _k}(K\hat{x}^k, \dot{y}), K(x - \hat{x}^k)\rangle \vspace{1ex}\\ &{}&{} - {~} \langle \nabla _u{g}_{\beta _k}(K\hat{x}^k, \dot{y}), K(x - x^{k+1})\rangle \vspace{1ex}\\ &{}&{} + {~} \frac{1}{2(\mu _{g^{*}} + \beta _k)}\Vert K(x^{k+1} - \hat{x}^k)\Vert ^2. \end{array} \end{aligned}$$

Now, combining the last two estimates, we get

$$\begin{aligned} \begin{array}{lcl} f(x^{k+1}) + g_{\beta _k}(Kx^{k+1}, \dot{y}) &{} \le &{} f(x) + g_{\beta _k}(K\hat{x}^k, \dot{y}) + \langle \nabla _ug_{\beta _k}(K\hat{x}^k, \dot{y}), K(x - \hat{x}^k)\rangle \vspace{1ex}\\ &{}&{} + {~} L_k\langle x^{k+1} - \hat{x}^k, x - \hat{x}^k\rangle - L_k\Vert x^{k+1} - \hat{x}^k\Vert ^2 \vspace{1ex}\\ &{}&{} + {~} \frac{1}{2(\mu _{g^{*}} + \beta _k)}\Vert K(x^{k+1} - \hat{x}^k)\Vert ^2 - \frac{\mu _f}{2}\Vert x - x^{k+1}\Vert ^2. \end{array} \end{aligned}$$
(41)

Using Lemma 4(a) again, we have

$$\begin{aligned} \begin{array}{lcl} \ell _{\beta _k}(x^k, \dot{y}) &{}:= &{} g_{\beta _k}(K\hat{x}^k, \dot{y}) + \langle \nabla _ug_{\beta _k}(K\hat{x}^k, \dot{y}), K(x^k - \hat{x}^k)\rangle \vspace{1ex}\\ &{}\le &{} g_{\beta _k}(Kx^k, \dot{y}) - \frac{\beta _k+\mu _{g^{*}}}{2}\Vert \nabla _ug_{\beta _k}(K\hat{x}^k, \dot{y}) - \nabla _ug_{\beta _k}(Kx^k, \dot{y})\Vert ^2. \end{array} \end{aligned}$$
(42)

Substituting \(x := x^k\) into (41), and multiplying the result by \(1-\tau _k\) and adding the result to (41) after multiplying it by \(\tau _k\), then using (42), we can derive

$$\begin{aligned} \begin{array}{lcl} F_{\beta _k}(x^{k+1}, \dot{y}) &{}:= &{} f(x^{k+1}) + g_{\beta _k}(Kx^{k+1}, \dot{y}) \vspace{1ex}\\ &{} \le &{} (1 - \tau _k)[ f(x^k) + g_{\beta _k}(Kx^k, \dot{y})] + \tau _k\left[ f(x) + \ell _{\beta _k}(x, \dot{y}) \right] \vspace{1ex}\\ &{}&{} - {~} L_k\Vert x^{k+1} - \hat{x}^k\Vert ^2 + \frac{1}{2(\mu _{g^{*}}+\beta _k)}\Vert K(x^{k+1} - \hat{x}^k)\Vert ^2 \vspace{1ex}\\ &{}&{} + {~} L_k\langle x^{k+1} - \hat{x}^k, \tau _kx - \hat{x}^k + (1-\tau _k)x^k\rangle \vspace{1ex}\\ &{}&{} - {~} \frac{\mu _f}{2}\left[ (1-\tau _k)\Vert x^{k+1} - x^k\Vert ^2 + \tau _k\Vert x - x^{k+1}\Vert ^2\right] \vspace{1ex}\\ &{}&{} - {~} \frac{(1-\tau _k)(\beta _k+\mu _{g^{*}})}{2}\Vert \nabla _ug_{\beta _k}(K\hat{x}^k, \dot{y}) - \nabla _ug_{\beta _k}(Kx^k, \dot{y})\Vert ^2. \end{array} \end{aligned}$$
(43)

From Lemma 6(a), we can easily show that

$$\begin{aligned} \begin{array}{lcl} (1-\tau _k)\Vert x^{k+1} - x^k\Vert ^2 + \tau _k\Vert x^{k+1} - x\Vert ^2 &{} = &{} \tau _k^2\Vert \tfrac{1}{\tau _k}[ x^{k+1} - (1-\tau _k)x^k] - x\Vert ^2 \vspace{1ex}\\ &{}&{} + {~} \tau _k(1-\tau _k)\Vert x - x^k\Vert ^2. \end{array} \end{aligned}$$

We also have the following elementary relation

$$\begin{aligned} \begin{array}{lcl} \langle x^{k+1} - \hat{x}^k, \tau _k x - [\hat{x}^k - (1-\tau _k)x^k]\rangle &{}= &{} \frac{\tau _k^2}{2}\Vert \tfrac{1}{\tau _k}[\hat{x}^k - (1-\tau _k)x^k] - x\Vert ^2 + \frac{1}{2}\Vert x^{k+1} - \hat{x}^k\Vert ^2 \vspace{1ex}\\ &{}&{} - {~} \frac{\tau _k^2}{2}\Vert \tfrac{1}{\tau _k}[x^{k+1} - (1-\tau _k)x^k] - x\Vert ^2. \end{array} \end{aligned}$$

Substituting the two last expressions into (43), we obtain

$$\begin{aligned} \begin{array}{lcl} F_{\beta _k}(x^{k+1}, \dot{y}) &{} \le &{} (1 - \tau _k) F_{\beta _k}(x^k, \dot{y}) + \tau _k \left[ f(x) + \ell _{\beta _k}(x, \dot{y}) \right] \vspace{1ex}\\ &{}&{} + {~} \frac{L_k\tau _k^2}{2} \Vert \tfrac{1}{\tau _k}[\hat{x}^k - (1-\tau _k)x^k] - x\Vert ^2 \vspace{1ex}\\ &{}&{} - {~} \frac{\tau _k^2}{2}\left( L_k + \mu _f \right) \Vert \tfrac{1}{\tau _k}[x^{k+1} - (1-\tau _k)x^k] - x\Vert ^2 \vspace{1ex}\\ &{}&{} - {~} \frac{(1-\tau _k)(\mu _{g^{*}} + \beta _k)}{2}\Vert \nabla _ug_{\beta _k}(K\hat{x}^k, \dot{y}) - \nabla _ug_{\beta _k}(Kx^k, \dot{y})\Vert ^2 \vspace{1ex}\\ &{}&{} - {~} \frac{L_k}{2}\Vert x^{k+1} - \hat{x}^k\Vert ^2 + \frac{1}{2(\mu _{g^{*}}+\beta _k)}\Vert K(x^{k+1} - \hat{x}^k)\Vert ^2 \vspace{1ex}\\ &{}&{} - {~} \frac{\mu _f(1-\tau _k)\tau _k}{2}\Vert x - x^k\Vert ^2. \end{array} \end{aligned}$$
(44)

One the one hand, by (32) of Lemma 4, we have

$$\begin{aligned} F_{\beta _k}(x^k, \dot{y}) \le F_{\beta _{k-1}}(x^k, \dot{y}) + \frac{(\beta _{k-1} - \beta _k)}{2}\Vert \nabla _ug_{\beta _{k}}(Kx^k, \dot{y}) - \dot{y}\Vert ^2. \end{aligned}$$

On the other hand, by (33) of Lemma 4, we get

$$\begin{aligned} f(x) + \ell _{\beta _k}(x, \dot{y}) \le \mathcal {L}(x, y^{k+1}) - \frac{\beta _k}{2}\Vert \nabla _ug_{\beta _{k}}(K\hat{x}^k, \dot{y}) - \dot{y}\Vert ^2, \end{aligned}$$

where \(\mathcal {L}(x, y^{k+1}) := f(x) + \langle Kx, y^{k+1}\rangle - g^{*}(y^{k+1})\) is the Lagrange function in (1).

Now, substituting the last two inequalities into (44), and using Lemma 6(b) with \(w := \nabla _ug_{\beta _k}(K\hat{x}^k, \dot{y}) - \dot{y}\) and \(z := \nabla _ug_{\beta _k}(Kx^k, \dot{y}) - \dot{y}\), we arrive at

$$\begin{aligned} \begin{array}{lcl} F_{\beta _k}(x^{k+1}, \dot{y}) &{} \le &{} (1 - \tau _k) F_{\beta _{k-1}}(x^k, \dot{y}) + \tau _k\mathcal {L}(x, y^{k+1}) + \frac{L_k\tau _k^2}{2} \Vert \tfrac{1}{\tau _k}[\hat{x}^k - (1-\tau _k)x^k] - x\Vert ^2 \vspace{1ex}\\ &{}&{} - {~} \frac{\tau _k^2}{2}\left( L_k + \mu _f \right) \Vert \tfrac{1}{\tau _k}[x^{k+1} - (1-\tau _k)x^k] - x\Vert ^2 - \frac{\mu _f(1-\tau _k)\tau _k}{2}\Vert x - x^k\Vert ^2 \vspace{1ex}\\ &{}&{} - {~} \frac{L_k}{2}\Vert x^{k+1} - \hat{x}^k\Vert ^2 + \frac{1}{2(\mu _{g^{*}}+\beta _k)}\Vert K(x^{k+1} - \hat{x}^k)\Vert ^2 \vspace{1ex}\\ &{}&{} - {~} \frac{(1-\tau _k)}{2}\left[ \tau _k\beta _k - (\beta _{k-1} - \beta _k)\right] \Vert \nabla _ug_{\beta _k}(Kx^k, \dot{y}) - \dot{y}\Vert ^2 \vspace{1ex}\\ &{}&{} - {~} \frac{\beta _k}{2}\Vert \nabla _ug_{\beta _k}(K\hat{x}^k, \dot{y}) - \dot{y} - (1-\tau _k)\left[ \nabla _ug_{\beta _k}(Kx^k, \dot{y}) - \dot{y} \right] \Vert ^2 \vspace{1ex}\\ &{}&{} - {~} \frac{(1-\tau _k)\mu _{g^{*}}}{2}\Vert \nabla _ug_{\beta _k}(K\hat{x}^k, \dot{y}) - \nabla _ug_{\beta _k}(Kx^k, \dot{y})\Vert ^2. \end{array} \end{aligned}$$

By dropping the last two nonpositive terms in the last inequality, we obtain (10). \(\square \)

1.2 The proof of Lemma 3: recursive estimate of the Lyapunov function

Proof

First, from the last line \(\tilde{y}^{k+1} = (1-\tau _k)\tilde{y}^k + \tau _ky^{k+1}\) of (11), and the \(\mu _{g^{*}}\)-convexity of \(g^{*}\), we have

$$\begin{aligned} \begin{array}{lcl} \mathcal {L}(x, \tilde{y}^{k+1}) &{} := &{} f(x) + \langle Kx, \tilde{y}^{k+1}\rangle - g^{*}(\tilde{y}^{k+1}) \vspace{1ex}\\ &{} \ge &{} (1-\tau _k)\mathcal {L}(x, \tilde{y}^k) + \tau _k\mathcal {L}(x, y^{k+1}) + \frac{\mu _{g^{*}}\tau _k(1-\tau _k)}{2}\Vert y^{k+1} - \tilde{y}^k\Vert ^2. \end{array} \end{aligned}$$

Hence, \(\tau _k\mathcal {L}(x, y^{k+1}) \le \mathcal {L}(x, \tilde{y}^{k+1}) - (1-\tau _k)\mathcal {L}(x, \tilde{y}^k) - \frac{\mu _{g^{*}}\tau _k(1-\tau _k)}{2}\Vert y^{k+1} - \tilde{y}^k\Vert ^2\). Substituting this estimate into (10) and dropping the term \(- \frac{\mu _{g^{*}}\tau _k(1-\tau _k)}{2}\Vert y^{k+1} - \tilde{y}^k\Vert ^2\), we can derive

$$\begin{aligned} \begin{array}{lcl} F_{\beta _k}(x^{k+1},\dot{y}) &{} \le &{} (1 - \tau _k) F_{\beta _{k-1}}(x^k, \dot{y}) + \mathcal {L}(x, \tilde{y}^{k+1}) - (1-\tau _k)\mathcal {L}(x, \tilde{y}^k) \vspace{1ex}\\ &{}&{} + {~} \frac{L_k\tau _k^2}{2} \big \Vert \tfrac{1}{\tau _k}[\hat{x}^k - (1-\tau _k)x^k] - x \big \Vert ^2 \vspace{1ex}\\ &{}&{} - {~} \frac{\tau _k^2}{2}\left( L_k + \mu _f\right) \big \Vert \tfrac{1}{\tau _k}[x^{k+1} - (1-\tau _k)x^k] - x \big \Vert ^2 \vspace{1ex}\\ &{}&{} - {~} \frac{L_k}{2}\Vert x^{k+1} - \hat{x}^k\Vert ^2 + \frac{1}{2(\mu _{g^{*}} + \beta _k)}\Vert K(x^{k+1} - \hat{x}^k)\Vert ^2 \vspace{1ex}\\ &{}&{} - {~} \frac{\mu _f\tau _k(1-\tau _k)}{2}\Vert x^k - x\Vert ^2. \end{array} \end{aligned}$$
(45)

Now, it is obvious to show that the condition (14) is equivalent to the condition (34) of Lemma 7. In addition, we choose \(\eta _k = \frac{1}{\omega _k}\) in our update (13), where \(\omega _k := \frac{\tau _{k-1}^2 + m_k\tau _k}{\tau _{k-1}(1-\tau _{k-1})}\), which is the upper bound of (35). Hence, (35) automatically holds. Using (36), we have

$$\begin{aligned} \begin{array}{lcl} \mathcal {T}_{[2]} &{}:= &{} \frac{L_k\tau _k^2}{2} \big \Vert \tfrac{1}{\tau _k}[\hat{x}^k - (1-\tau _k)x^k] - x \big \Vert ^2 - \frac{\mu _f\tau _k(1-\tau _k)}{2}\Vert x^k - x\Vert ^2 \vspace{1ex}\\ &{}\le &{} \frac{\tau _{k-1}^2}{2} (1-\tau _k)\left( L_{k-1} + \mu _f\right) \big \Vert \tfrac{1}{\tau _{k-1}}[x^{k} - (1-\tau _{k-1})x^{k-1}] - x \big \Vert ^2. \end{array} \end{aligned}$$

Moreover, \(\frac{1}{2(\mu _{g^{*}} + \beta _k)}\Vert K(x^{k+1} - \hat{x}^k)\Vert ^2 \le \frac{\Vert K\Vert ^2}{2(\mu _{g^{*}} + \beta _k)}\Vert x^{k+1} - \hat{x}^k\Vert ^2 = \frac{L_k}{2}\Vert x^{k+1} - \hat{x}^k\Vert ^2\) due to the definition of \(L_k\) in (13). Substituting these two estimates into (45), and utilizing the definition (12) of \(\mathcal {V}_k\), we obtain (15). \(\square \)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Tran-Dinh, Q. A unified convergence rate analysis of the accelerated smoothed gap reduction algorithm. Optim Lett 16, 1235–1257 (2022). https://doi.org/10.1007/s11590-021-01775-4

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11590-021-01775-4

Keywords

Navigation