A unified convergence rate analysis of the accelerated smoothed gap reduction algorithm

Tran-Dinh, Quoc

doi:10.1007/s11590-021-01775-4

A unified convergence rate analysis of the accelerated smoothed gap reduction algorithm

Original Paper
Published: 01 July 2021

Volume 16, pages 1235–1257, (2022)
Cite this article

Optimization Letters Aims and scope Submit manuscript

Quoc Tran-Dinh ORCID: orcid.org/0000-0002-1077-2579¹

253 Accesses
2 Citations
1 Altmetric
Explore all metrics

Abstract

In this paper, we develop a unified convergence analysis framework for the Accelerated Smoothed GAp ReDuction algorithm (ASGARD) introduced in Tran-Dinh et al. (SIAM J Optim 28(1):96–134, 2018). Unlike Tran-Dinh et al. (SIAM J Optim 28(1):96–134, 2018), the new analysis covers three settings in a single algorithm: general convexity, strong convexity, and strong convexity and smoothness. Moreover, we establish the convergence guarantees on three criteria: (i) gap function, (ii) primal objective residual, and (iii) dual objective residual. Our convergence rates are optimal (up to a constant factor) in all cases. While the convergence rate on the primal objective residual for the general convex case has been established in Tran-Dinh et al. (SIAM J Optim 28(1):96–134, 2018), we prove additional convergence rates on the gap function and the dual objective residual. The analysis for the last two cases is completely new. Our results provide a complete picture on the convergence guarantees of ASGARD. Finally, we present four different numerical experiments on a representative optimization model to verify our algorithm and compare it with the well-known Nesterov’s smoothing algorithm.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

An adaptive primal-dual framework for nonsmooth convex minimization

Article 31 October 2019

On the Sublinear Convergence Rate of Multi-block ADMM

Article 19 July 2015

A study of progressive hedging for stochastic integer programming

Article Open access 11 October 2023

References

Bauschke, H.H., Combettes, P.: Convex Analysis and Monotone Operators Theory in Hilbert Spaces, 2nd edn. Springer, Berlin (2017)
Book Google Scholar
Beck, A., Teboulle, M.: A fast iterative shrinkage-thresholding algorithm for linear inverse problems. SIAM J. Imaging Sci. 2(1), 183–202 (2009)
Article MathSciNet Google Scholar
Belloni, A., Chernozhukov, V., Wang, L.: Square-root LASSO: Pivotal recovery of sparse signals via conic programming. Biometrika 94(4), 791–806 (2011)
Article MathSciNet Google Scholar
Boţ, R.I., Böhm, A.: Variable smoothing for convex optimization problems using stochastic gradients. J. Sci. Comput. 85(2), 1–29 (2020)
Chambolle, A., Pock, T.: A first-order primal–dual algorithm for convex problems with applications to imaging. J. Math. Imaging Vis. 40(1), 120–145 (2011)
Article MathSciNet Google Scholar
Chambolle, A., Pock, T.: On the ergodic convergence rates of a first-order primal–dual algorithm. Math. Program. 159(1–2), 253–287 (2016)
Article MathSciNet Google Scholar
Chen, Y., Lan, G., Ouyang, Y.: Optimal primal–dual methods for a class of saddle-point problems. SIAM J. Optim. 24(4), 1779–1814 (2014)
Article MathSciNet Google Scholar
Condat, L.: A primal–dual splitting method for convex optimization involving Lipschitzian, proximable and linear composite terms. J. Optim. Theory Appl. 158, 460–479 (2013)
Article MathSciNet Google Scholar
Davis, D.: Convergence rate analysis of primal–dual splitting schemes. SIAM J. Optim. 25(3), 1912–1943 (2015)
Article MathSciNet Google Scholar
Davis, D., Yin, W.: A three-operator splitting scheme and its optimization applications. Set-Valued Var. Anal. 25(4), 829–858 (2017)
Article MathSciNet Google Scholar
Esser, E., Zhang, X., Chan, T.: A general framework for a class of first order primal–dual algorithms for TV-minimization. SIAM J. Imaging Sci. 3(4), 1015–1046 (2010)
Article MathSciNet Google Scholar
Goldstein, T., Esser, E., Baraniuk, R.: Adaptive primal-dual hybrid gradient methods for saddle point problems. Technical Report, pp. 1–26 (2013). arxiv: 1305.0546v1.pdf
Grant, M.: Disciplined Convex Programming. Ph.D. thesis, Stanford University (2004)
He, B.S., Yuan, X.M.: On the ${O}(1/n)$ convergence rate of the Douglas–Rachford alternating direction method. SIAM J. Numer. Anal. 50, 700–709 (2012)
Article MathSciNet Google Scholar
Nemirovskii, A., Yudin, D.: Problem Complexity and Method Efficiency in Optimization. Wiley Interscience, London (1983)
Google Scholar
Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course, Volume 87 of Applied Optimization. Kluwer Academic Publishers, London (2004)
Book Google Scholar
Nesterov, Y.: Smooth minimization of non-smooth functions. Math. Program. 103(1), 127–152 (2005)
Article MathSciNet Google Scholar
Nesterov, Y.: Gradient methods for minimizing composite objective function. Math. Program. 140(1), 125–161 (2013)
Article MathSciNet Google Scholar
O’Connor, D., Vandenberghe, L.: Primal-dual decomposition by operator splitting and applications to image deblurring. SIAM J. Imaging Sci. 7(3), 1724–1754 (2014)
Ouyang, Y., Xu, Y.: Lower complexity bounds of first-order methods for convex-concave bilinear saddle-point problems. Math. Program. 185, 1–35 (2019)
Article MathSciNet Google Scholar
Sabach, S., Teboulle, M.: Faster Lagrangian-based methods in convex optimization (2020). arXiv preprint arXiv:2010.14314
Tran-Dinh, Q., Alacaoglu, A., Fercoq, O., Cevher, V.: An adaptive primal–dual framework for nonsmooth convex minimization. Math. Program. Compt. 12, 451–491 (2020)
Article MathSciNet Google Scholar
Tran-Dinh, Q., Fercoq, O., Cevher, V.: A smooth primal–dual optimization framework for nonsmooth composite convex minimization. SIAM J. Optim. 28(1), 96–134 (2018)
Article MathSciNet Google Scholar
Tran-Dinh, Q., Savorgnan, C., Diehl, M.: Combining Lagrangian decomposition and excessive gap smoothing technique for solving large-scale separable convex optimization problems. Comput. Optim. Appl. 55(1), 75–111 (2013)
Article MathSciNet Google Scholar
Tran-Dinh, Q., Zhu, Y.: Non-stationary first-order primal–dual algorithms with faster convergence rates. SIAM J. Optim. 30(4), 2866–2896 (2020)
Article MathSciNet Google Scholar
Tseng, P.: On accelerated proximal gradient methods for convex–concave optimization. SIAM J. Optim. (2008)
Valkonen, T.: Inertial, corrected, primal–dual proximal splitting. SIAM J. Optim. 30(2), 1391–1420 (2020)
Article MathSciNet Google Scholar
Vu, B.C.: A variable metric extension of the forward–backward–forward algorithm for monotone operators. Numer. Funct. Anal. Optim. 34(9), 1050–1065 (2013)
Article MathSciNet Google Scholar
Zhu, Y., Liu, D., Tran-Dinh, Q.: Primal–dual algorithms for a class of nonlinear compositional convex optimization problems, pp. 1–26 (2020). arXiv preprint arXiv:2006.09263
Zou, H., Hastie, T.: Regularization and variable selection via the elastic net. J. Roy. Stat. Soc. B 67(2), 301–320 (2005)
Article MathSciNet Google Scholar

Download references

Acknowledgements

This work is partly supported by the Office of Naval Research under Grant No. ONR-N00014-20-1-2088 (2020–2023), and the Nafosted Vietnam, Grant No. 101.01-2020.06 (2020–2022).

Author information

Authors and Affiliations

Department of Statistics and Operations Research, The University of North Carolina at Chapel Hill, 333 Hanes Hall, UNC-CH, Chapel Hill, NC, 27599, USA
Quoc Tran-Dinh

Authors

Quoc Tran-Dinh
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Quoc Tran-Dinh.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

Appendix 1: Technical lemmas

We need the following technical lemmas for our convergence analysis in the main text.

Lemma 4

([23, Lemma 10]) Given $\beta > 0$, $\dot{y}\in \mathbb {R}^n$, and a proper, closed, and convex function $g : \mathbb {R}^n \rightarrow \mathbb {R}\cup \{+\infty \}$ with its Fenchel conjugate $g^{*}$, we define

$$\begin{aligned} g_{\beta }(u, \dot{y}) := \max _{y\in \mathbb {R}^n}\left\{ \langle u, y\rangle - g^{*}(y) - \tfrac{\beta }{2}\Vert y - \dot{y}\Vert ^2\right\} . \end{aligned}$$

(29)

Let $y^{*}_{\beta }(u, \dot{y})$ be the unique solution of (29). Then, the following statements hold:

(a):

$g_{\beta }(\cdot ,\dot{y})$ is convex w.r.t. u on $\mathrm {dom}\left( g\right) $ and $\frac{1}{\beta + \mu _{g^{*}}}$-smooth w.r.t. u on $\mathrm {dom}\left( g\right) $, where $\nabla _u{g_{\beta }}(u,\dot{y}) = \text {prox}_{g^{*}/\beta }(\dot{y} + \frac{1}{\beta }u)$. Moreover, for any $u, \hat{u} \in \mathrm {dom}\left( g\right) $, we have

$$\begin{aligned} g_{\beta }(\hat{u},\dot{y}) + \langle \nabla {g}_{\beta }(\hat{u},\dot{y}), u - \hat{u}\rangle \le g_{\beta }(u,\dot{y}) - \frac{\beta + \mu _{g^{*}}}{2}\Vert \nabla _u{g_{\beta }}(\hat{u},\dot{y}) - \nabla _u{g_{\beta }}(u,\dot{y})\Vert ^2. \end{aligned}$$

(30)

(b):

For any $\beta > 0$, $\dot{y}\in \mathbb {R}^n$, and $u\in \mathrm {dom}\left( g\right) $, we have

$$\begin{aligned} \begin{array}{lcl} g_{\beta }(u,\dot{y}) \le g(u) \le g_{\beta }(u,\dot{y}) + \frac{\beta }{2} [ D_{g}(\dot{y})]^2, \ \text {where} \ D_{g}(\dot{y}) := \sup _{y\in \partial {g(u)}} \left\| y - \dot{y}\right\| . \end{array} \end{aligned}$$

(31)

(c):

For $u\in \mathrm {dom}\left( g\right) $ and $\dot{y}\in \mathbb {R}^n$, $g_{\beta }(u,\dot{y})$ is convex in $\beta $, and for all $\hat{\beta } \ge \beta > 0$, we have

$$\begin{aligned} g_{\beta }(u,\dot{y}) \le g_{\hat{\beta }}(u,\dot{y}) + \big (\tfrac{\hat{\beta } - \beta }{2}\big )\Vert \nabla _u{g_{\beta }}(u,\dot{y}) - \dot{y} \Vert ^2. \end{aligned}$$

(32)

(d):

For any $\beta > 0$, and $u, \hat{u}\in \mathrm {dom}\left( g\right) $, we have

$$\begin{aligned} \begin{array}{lcl} g_{\beta }(u,\dot{y}) + \langle \nabla _u{g_{\beta }}(u, \dot{y}), \hat{u} - u\rangle\le & {} \ell _{\beta }(\hat{u}, \dot{y}) - \frac{\beta }{2}\Vert \nabla _u{g_{\beta }}(u,\dot{y}) - \dot{y}\Vert ^2, \end{array} \end{aligned}$$

(33)

where $\ell _{\beta }(\hat{u}, \dot{y}) := \langle \hat{u}, \nabla _u{g_{\beta }}(u,\dot{y}) \rangle - g^{*}(\nabla _u{g_{\beta }}(u,\dot{y})) \le g(\hat{u}) - \frac{\mu _{g^{*}}}{2}\Vert \nabla _u{g_{\beta }}(u,\dot{y}) - \nabla {g}(\hat{u})\Vert ^2$ for any $\nabla {g}(\hat{u}) \in \partial {g}(\hat{u})$.

Lemma 5

The following statements hold.

$\mathrm {(a)}$:

Let $\left\{ \tau _k\right\} \subset (0, 1]$ be computed by $\tau _{k+1} := \frac{\tau _k}{2}\big [ (\tau _k^2 + 4)^{1/2} - \tau _k\big ]$ for some $\tau _0 \in (0, 1]$. Then, we have

$$\begin{aligned}&\tau _k^2 = (1-\tau _k)\tau _{k-1}^2, \quad \frac{1}{k + 1/\tau _0} \le \tau _k < \frac{2}{k + 2/\tau _0}, \\&\quad \text {and}\quad \frac{1}{1 + \tau _{k-2}} \le 1 - \tau _k \le \frac{1}{1+\tau _{k-1}}. \end{aligned}$$

Moreover, we also have

$$\begin{aligned} \begin{array}{ll} &{} \varTheta _{l,k} := \displaystyle \prod _{i=l}^k(1-\tau _i) = \dfrac{\tau _k^2}{\tau _{l-1}^2} \quad \text {for}\ 0\le l\le k, \qquad \\ &{}\varTheta _{0,k} = \dfrac{(1-\tau _0)\tau _k^2}{\tau _0^2} \le \dfrac{4(1-\tau _0)}{(\tau _0k+2)^2}, \vspace{1ex}\\ \text {and}\quad &{}\dfrac{\tau _{l+1}^2}{\tau _{k+2}^2} \le \varGamma _{l,k} := \displaystyle \prod _{i=l}^k(1+\tau _i) \le \dfrac{\tau _l^2}{\tau _{k+1}^2} \quad \text {for} \ 0 \le l \le k. \end{array} \end{aligned}$$

If we update $\beta _k := \frac{\beta _{k-1}}{1+\tau _k}$ for a given $\beta _0 > 0$, then

$$\begin{aligned} \frac{4\beta _0\tau _0^2}{\tau _1^2[\tau _0(k+1) + 2]^2} \le \frac{\beta _0\tau _{k+1}^2}{\tau _1^2} \le \beta _k = \frac{\beta _0}{\varGamma _{1,k}}\le \frac{\beta _0\tau _{k+2}^2}{\tau _2^2} \le \frac{4\beta _0\tau _0^2}{\tau _2^2[\tau _0(k+2) + 2]^2}. \end{aligned}$$

$\mathrm {(b)}$:

Let $\left\{ \tau _k\right\} \subset (0, 1]$ be computed by solving $\tau _k^3 + \tau _k^2 + \tau _{k-1}^2\tau _k - \tau _{k-1}^2 = 0$ for all $k\ge 1$ and $\tau _0 := 1$. Then, we have $\frac{1}{k+1} \le \tau _k \le \frac{2}{k+2}$ and $\varTheta _{1,k} := \prod _{i=1}^k(1-\tau _i) \le \frac{1}{k+1}$. Moreover, if we update $\beta _k := \frac{\beta _{k-1}}{1+\tau _k}$, then $\beta _k \le \frac{2\beta _0}{k+2}$.

Proof

The first two relations of (a) have been proved, e.g., in [24]. Let us prove the last inequality of (a). Since $\frac{1}{1+\tau _{k-2}} \le 1-\tau _k$ is equivalent to $\tau _{k-2}(1-\tau _k) \ge \tau _k$. Using $1- \tau _k = \frac{\tau _k^2}{\tau _{k-1}^2}$, we have $\tau _k\tau _{k-2} \ge \tau _{k-1}^2$. Utilizing $\tau _k = \frac{\tau _{k-1}}{2}\big [(\tau _{k-1}^2 + 4)^2 - \tau _{k-1}\big ]$, this condition is equivalent to $\tau _{k-2}^2 \ge \tau _{k-1}^2(1 + \tau _{k-2})$. However, since $\tau _{k-1}^2 = (1-\tau _{k-1})\tau _{k-2}^2$, the last condition becomes $1 \ge (1-\tau _{k-1})(1+\tau _{k-2})$, or equivalently, $\tau _{k-1} \le \tau _{k-2}$, which automatically holds.

To prove $1-\tau _k \le \frac{1}{1 + \tau _{k-1}}$, we write it as $\tau _{k-1}(1-\tau _k) \le \tau _{k}$. Using again $\tau _k^2 = (1-\tau _k)\tau _{k-1}^2$, the last inequality is equivalent to $\tau _k \le \tau _{k-1}$, which automatically holds. The last statements of (a) is a consequence of $1-\tau _k = \frac{\tau _k^2}{\tau _{k-1}^2}$ and the previous relations.

(b) We consider the function $\varphi (\tau ) := \tau ^3 + \tau ^2 + \tau _{k-1}^2\tau - \tau _{k-1}^2$. Clearly, $\varphi (0) = -\tau _{k-1}^2 < 0$ and $\varphi (1) = 2 > 0$. Moreover, $\varphi '(\tau ) = 3\tau ^2 + 2\tau + \tau _{k-1}^2 > 0$ for $\tau \in [0, 1]$. Hence, the cubic equation $\varphi (\tau ) = 0$ has a unique solution $\tau _k \in (0, 1)$. Therefore, $\{\tau _k\}_{k\ge 0}$ is well-defined.

Next, since $\tau _k^3 + \tau _k^2 + \tau _k\tau _{k-1}^2 - \tau _{k-1}^2 = 0$ is equivalent to $\tau _{k-1}^2(1-\tau _k) = \tau _k^2(1+\tau _k)$, we have $\tau _{k-1}^2(1-\tau _k) = \tau _k^2(1+\tau _k) \le \frac{\tau _k^2}{1-\tau _k}$. This inequality becomes $\tau _k \ge \frac{\tau _{k-1}}{1 + \tau _{k-1}}$. By induction and $\tau _0 = 1$, we can easily show that $\tau _k \ge \frac{1}{k+1}$. On the other hand, $\tau _{k-1}^2(1-\tau _k) = \tau _k^2(1+\tau _k) \ge \tau _k^2$. From this inequality, with a similar argument as in the proof of the statement (a), we can also easily show that $\tau _k \le \frac{2}{k+2}$. Hence, we have $\frac{1}{k+1} \le \tau _k \le \frac{2}{k+2}$ for all $k\ge 0$.

Finally, since $\tau _k \ge \frac{1}{k+1}$, we have $\prod _{i=1}^k(1-\tau _i) \le \prod _{i=1}^k\left( 1 - \frac{1}{i+1}\right) = \frac{1}{k+1}$. Alternatively, $\prod _{i=1}^k(1+\tau _i) \ge \prod _{i=1}^k\left( 1 + \frac{1}{i+1}\right) = \frac{k+2}{2}$. However, since $\beta _k = \frac{\beta _{k-1}}{1+\tau _k}$, we have $\beta _k = \beta _0\prod _{i=1}^k\frac{1}{1+\tau _i} \le \frac{2\beta _0}{k+2}$. $\square $

Lemma 6

([29, Lemma 4] and [23]) The following statements hold.

$\mathrm {(a)}$:: For any $u, v, w\in \mathbb {R}^p$ and $t_1, t_2 \in \mathbb {R}$ such that $t_1 + t_2 \ne 0$, we have
$$\begin{aligned} t_1\Vert u - w\Vert ^2 + t_2\Vert v - w\Vert ^2 = (t_1 + t_2)\Vert w - \tfrac{1}{t_1+t_2}(t_1u + t_2v)\Vert ^2 + \tfrac{t_1t_2}{t_1+t_2}\Vert u-v\Vert ^2. \end{aligned}$$
$\mathrm {(b)}$:: For any $\tau \in (0, 1)$, $\hat{\beta }, \beta > 0$, $w, z\in \mathbb {R}^p$, we have
$$\begin{aligned} \begin{array}{lcl} &{}&{}\beta (1-\tau ) \Vert w - z\Vert ^2 + \beta \tau \Vert w\Vert ^2 - (1-\tau )(\hat{\beta } - \beta )\Vert z\Vert ^2 = \beta \Vert w - (1-\tau )z\Vert ^2 \vspace{1ex}\\ &{}&{} \quad + {~} (1-\tau )\big [\tau \beta - (\hat{\beta } - \beta ) \big ]\Vert z\Vert ^2. \end{array} \end{aligned}$$

The following lemma is a key step to address the strongly convex case of f in (1).

Lemma 7

Given $L_k > 0$, $\mu _f > 0$, and $\tau _k \in (0, 1)$, let $m_k := \frac{L_k + \mu _f}{L_{k-1} + \mu _f}$ and $a_k := \frac{L_k}{L_{k-1} + \mu _f}$. Assume that the following two conditions hold:

$$\begin{aligned} \left\{ \begin{array}{llcl} &{}(1-\tau _k) \big [ \tau _{k-1}^2 + m_k\tau _k \big ]&{}\ge &{} a_k\tau _k \vspace{1ex}\\ &{}m_k\tau _k\tau _{k-1}^2 + m_k^2\tau _k^2 &{}\ge &{} a_k\tau _{k-1}^2. \end{array}\right. \end{aligned}$$

(34)

Let $\left\{ x^k\right\} $ be a given sequence in $\mathbb {R}^p$. We define $\hat{x}^k := x^k + \frac{1}{\omega _k}(x^k - x^{k-1})$, where $\omega _k$ is chosen such that

$$\begin{aligned} \max \left\{ \frac{\tau _{k-1} + \sqrt{\tau _{k-1}^2 + 4a_k}}{2(1-\tau _{k-1})}, \frac{a_k\tau _k}{(1-\tau _k)(1-\tau _{k-1})\tau _{k-1}}\right\} \le \omega _k \le \frac{\tau _{k-1}^2 + m_k\tau _k}{\tau _{k-1}(1-\tau _{k-1})}. \end{aligned}$$

(35)

Then, $\omega _k$ is well-defined, and for any $x \in \mathbb {R}^p$, we have

$$\begin{aligned} \begin{array}{ll} &{}L_k\tau _k^2\Vert \frac{1}{\tau _k}[\hat{x}^k - (1-\tau _k)x^k] - x\Vert ^2 - \mu _f \tau _k(1-\tau _k)\Vert x^k - x\Vert ^2 \vspace{1ex}\\ &{}\qquad \le {~} (1-\tau _k) \left( L_{k-1} + \mu _f\right) \tau _{k-1}^2\Vert \frac{1}{\tau _{k-1}}[x^k - (1-\tau _{k-1})x^{k-1}] - x\Vert ^2. \end{array} \end{aligned}$$

(36)

Proof

Firstly, from the definition $\hat{x}^k := x^k + \frac{1}{\omega _k}(x^k - x^{k-1})$ of $\hat{x}^k$, we have $\omega _k(\hat{x}^k - x^k) = x^k - x^{k-1}$. Hence, we can show that

$$\begin{aligned} \begin{array}{lcl} \tau _{k-1}^2\Vert \frac{1}{\tau _{k-1}}[x^k - (1-\tau _{k-1})x^{k-1}] - x\Vert ^2 &{}= &{} \Vert (1-\tau _{k-1})(x^k - x^{k-1}) + \tau _{k-1}(x^k - x)\Vert ^2 \vspace{1ex}\\ &{}= &{} \Vert (1-\tau _{k-1})\omega _k(\hat{x}^k - x^{k}) + \tau _{k-1}(x^k - x)\Vert ^2 \vspace{1ex}\\ &{}= &{} \omega _k^2(1-\tau _{k-1})^2\Vert \hat{x}^k - x^k\Vert ^2 + \tau _{k-1}^2\Vert x^k - x\Vert ^2\vspace{1ex}\\ &{}&{} + {~} 2\omega _k(1-\tau _{k-1})\tau _{k-1}\langle \hat{x}^k - x^k, x^k - x\rangle . \end{array} \end{aligned}$$

Alternatively, we also have

$$\begin{aligned} \begin{array}{lcl} \tau _k^2\Vert \frac{1}{\tau _k}[\hat{x}^k - (1-\tau _k)x^k] - x\Vert ^2= & {} \Vert \hat{x}^k - x^k\Vert ^2 + \tau _k^2\Vert x^k - x\Vert ^2 + 2\tau _k\langle \hat{x}^k - x^k, x^k - x\rangle . \end{array} \end{aligned}$$

Utilizing the two last expressions, (36) can be rewritten equivalently to

$$\begin{aligned} \begin{array}{lcl} \mathcal {T}_{[1]} &{}:= &{} 2\left[ \left( L_{k-1} + \mu _f\right) (1-\tau _k)(1-\tau _{k-1})\tau _{k-1}\omega _k - L_k\tau _k \right] \langle \hat{x}^k - x^k, x - x^k\rangle \vspace{1ex}\\ &{}\le &{} \left[ \left( L_{k-1} + \mu _f \right) (1-\tau _k)(1-\tau _{k-1})^2\omega _k^2 - L_{k}\right] \Vert \hat{x}^k - x^k\Vert ^2 \vspace{1ex}\\ &{}&{} + {~} \left[ \left( L_{k-1} + \mu _f\right) (1-\tau _k)\tau _{k-1}^2 - L_k\tau _k^2 + \mu _f\tau _k(1-\tau _k)\right] \Vert x^k - x\Vert ^2. \end{array} \end{aligned}$$

Now, let us denote

$$\begin{aligned} \left\{ \begin{array}{lcl} c_1 &{}:= &{} \left( L_{k-1} + \mu _f\right) (1-\tau _k)(1-\tau _{k-1})\tau _{k-1}\omega _k - L_k\tau _k\vspace{1ex}\\ c_2 &{}:= &{} \left( L_{k-1} + \mu _f\right) (1-\tau _k)(1-\tau _{k-1})^2\omega _k^2 - L_k \vspace{1ex}\\ c_3 &{}:= &{} \left( L_{k-1} + \mu _f\right) (1-\tau _k)\tau _{k-1}^2 - L_k\tau _k^2 + \mu _f(1-\tau _k)\tau _k. \end{array}\right. \end{aligned}$$

Then, (36) is equivalent to

$$\begin{aligned} 2c_1\langle \hat{x}^k - x^k, x - x^k\rangle \le c_2\Vert \hat{x}^k - x^k\Vert ^2 + c_3\Vert x - x^k\Vert ^2. \end{aligned}$$

(37)

Secondly, we need to guarantee that $c_1 \ge 0$. This condition holds if we choose $\omega _k$ such that

$$\begin{aligned} \omega _k \ge \frac{a_k\tau _k}{(1-\tau _k)(1-\tau _{k-1})\tau _{k-1}}. \end{aligned}$$

(38)

Thirdly, we also need to guarantee $c_2 \ge c_1$, which is equivalent to

$$\begin{aligned} c_2 - c_1 = \left( L_{k-1} + \mu _f\right) (1-\tau _k)(1-\tau _{k-1})\left[ (1-\tau _{k-1})\omega _k^2 - \tau _{k-1}\omega _k \right] - L_k(1-\tau _k) \ge 0. \end{aligned}$$

This condition holds if

$$\begin{aligned} \omega _k \ge \frac{\tau _{k-1} + \sqrt{\tau _{k-1}^2 + 4a_k}}{2(1-\tau _{k-1})}. \end{aligned}$$

(39)

Alternatively, we also need to guarantee $c_3 \ge c_1$, which is equivalent to

$$\begin{aligned} c_3 - c_1 = \left( L_{k-1} + \mu _f\right) (1-\tau _k)\left[ \tau _{k-1}^2 - (1-\tau _{k-1})\tau _{k-1}\omega _k \right] + (L_k + \mu _f)\tau _k(1-\tau _k) \ge 0. \end{aligned}$$

This condition holds if

$$\begin{aligned} \omega _k \le \frac{\tau _{k-1}^2 + m_k\tau _k}{\tau _{k-1}(1-\tau _{k-1})}. \end{aligned}$$

(40)

Combining (38), (39), and (40), we obtain

$$\begin{aligned} \max \left\{ \frac{\tau _{k-1} + \sqrt{\tau _{k-1}^2 + 4a_k}}{2(1-\tau _{k-1})}, \frac{a_k\tau _k}{(1-\tau _k)(1-\tau _{k-1})\tau _{k-1}}\right\} \le \omega _k \le \frac{\tau _{k-1}^2 + m_k\tau _k}{\tau _{k-1}(1-\tau _{k-1})}, \end{aligned}$$

which is exactly (35). Here, under the condition (34), the left-hand side of the last expression is less than or equal to the right-hand side. Therefore, $\omega _k$ is well-defined.

Finally, under the choice of $\omega _k$ as in (35), we have $c_2 \ge c_1 \ge 0$ and $c_3\ge c_1 \ge 0$. Hence, (37) holds, which is also equivalent to (36). $\square $

Appendix 2: Technical proof of Lemmas 2 and 3 in Sect. 3

This section provides the full proof of Lemmas 2 and 3 in the main text.

1.1 The proof of Lemma 2: key estimate of the primal–dual step (9)

Proof

From the first line of (9) and Lemma 4(a), we have $\nabla _ug_{\beta _k}(K\hat{x}^k, \dot{y}) = K^{\top }y^{k+1}$. Now, from the second line of (9), we also have

$$\begin{aligned} 0 \in \partial {f}(x^{k+1}) + L_k(x^{k+1} - \hat{x}^k) + K^{\top }\nabla _u{g_{\beta _k}}(K\hat{x}^k, \dot{y}). \end{aligned}$$

Combining this inclusion and the $\mu _f$-convexity of f, for any $x\in \mathrm {dom}\left( f\right) $, we get

$$\begin{aligned} \begin{array}{lcl} f(x^{k+1}) &{} \le &{} f(x) + \langle \nabla _u{g_{\beta _k}}(K\hat{x}^k, \dot{y}), K(x - x^{k+1})\rangle + L_k\langle x^{k+1} - \hat{x}^k, x - x^{k+1}\rangle \vspace{1ex}\\ &{}&{} - {~} \frac{\mu _f}{2}\Vert x^{k+1} - x\Vert ^2. \end{array} \end{aligned}$$

Since $g_{\beta }(\cdot , \dot{y})$ is $\frac{1}{\beta + \mu _{g^{*}}}$-smooth by Lemma 4(a), for any $x\in \mathrm {dom}\left( f\right) $, we have

$$\begin{aligned} \begin{array}{lcl} g_{\beta _k}(Kx^{k+1}, \dot{y}) &{} \le &{} g_{\beta _k}(K\hat{x}^k, \dot{y}) + \langle \nabla _u{g}_{\beta _k}(K\hat{x}^k, \dot{y}), K(x^{k+1} - \hat{x}^k)\rangle \vspace{1ex}\\ &{}&{} + {~} \frac{1}{2(\beta _k + \mu _{g^{*}})}\Vert K(x^{k+1} - \hat{x}^k)\Vert ^2 \vspace{1ex}\\ &{} = &{} g_{\beta _k}(K\hat{x}^k, \dot{y}) + \langle \nabla _ug_{\beta _k}(K\hat{x}^k, \dot{y}), K(x - \hat{x}^k)\rangle \vspace{1ex}\\ &{}&{} - {~} \langle \nabla _u{g}_{\beta _k}(K\hat{x}^k, \dot{y}), K(x - x^{k+1})\rangle \vspace{1ex}\\ &{}&{} + {~} \frac{1}{2(\mu _{g^{*}} + \beta _k)}\Vert K(x^{k+1} - \hat{x}^k)\Vert ^2. \end{array} \end{aligned}$$

Now, combining the last two estimates, we get

$$\begin{aligned} \begin{array}{lcl} f(x^{k+1}) + g_{\beta _k}(Kx^{k+1}, \dot{y}) &{} \le &{} f(x) + g_{\beta _k}(K\hat{x}^k, \dot{y}) + \langle \nabla _ug_{\beta _k}(K\hat{x}^k, \dot{y}), K(x - \hat{x}^k)\rangle \vspace{1ex}\\ &{}&{} + {~} L_k\langle x^{k+1} - \hat{x}^k, x - \hat{x}^k\rangle - L_k\Vert x^{k+1} - \hat{x}^k\Vert ^2 \vspace{1ex}\\ &{}&{} + {~} \frac{1}{2(\mu _{g^{*}} + \beta _k)}\Vert K(x^{k+1} - \hat{x}^k)\Vert ^2 - \frac{\mu _f}{2}\Vert x - x^{k+1}\Vert ^2. \end{array} \end{aligned}$$

(41)

Using Lemma 4(a) again, we have

$$\begin{aligned} \begin{array}{lcl} \ell _{\beta _k}(x^k, \dot{y}) &{}:= &{} g_{\beta _k}(K\hat{x}^k, \dot{y}) + \langle \nabla _ug_{\beta _k}(K\hat{x}^k, \dot{y}), K(x^k - \hat{x}^k)\rangle \vspace{1ex}\\ &{}\le &{} g_{\beta _k}(Kx^k, \dot{y}) - \frac{\beta _k+\mu _{g^{*}}}{2}\Vert \nabla _ug_{\beta _k}(K\hat{x}^k, \dot{y}) - \nabla _ug_{\beta _k}(Kx^k, \dot{y})\Vert ^2. \end{array} \end{aligned}$$

(42)

Substituting $x := x^k$ into (41), and multiplying the result by $1-\tau _k$ and adding the result to (41) after multiplying it by $\tau _k$, then using (42), we can derive

$$\begin{aligned} \begin{array}{lcl} F_{\beta _k}(x^{k+1}, \dot{y}) &{}:= &{} f(x^{k+1}) + g_{\beta _k}(Kx^{k+1}, \dot{y}) \vspace{1ex}\\ &{} \le &{} (1 - \tau _k)[ f(x^k) + g_{\beta _k}(Kx^k, \dot{y})] + \tau _k\left[ f(x) + \ell _{\beta _k}(x, \dot{y}) \right] \vspace{1ex}\\ &{}&{} - {~} L_k\Vert x^{k+1} - \hat{x}^k\Vert ^2 + \frac{1}{2(\mu _{g^{*}}+\beta _k)}\Vert K(x^{k+1} - \hat{x}^k)\Vert ^2 \vspace{1ex}\\ &{}&{} + {~} L_k\langle x^{k+1} - \hat{x}^k, \tau _kx - \hat{x}^k + (1-\tau _k)x^k\rangle \vspace{1ex}\\ &{}&{} - {~} \frac{\mu _f}{2}\left[ (1-\tau _k)\Vert x^{k+1} - x^k\Vert ^2 + \tau _k\Vert x - x^{k+1}\Vert ^2\right] \vspace{1ex}\\ &{}&{} - {~} \frac{(1-\tau _k)(\beta _k+\mu _{g^{*}})}{2}\Vert \nabla _ug_{\beta _k}(K\hat{x}^k, \dot{y}) - \nabla _ug_{\beta _k}(Kx^k, \dot{y})\Vert ^2. \end{array} \end{aligned}$$

(43)

From Lemma 6(a), we can easily show that

$$\begin{aligned} \begin{array}{lcl} (1-\tau _k)\Vert x^{k+1} - x^k\Vert ^2 + \tau _k\Vert x^{k+1} - x\Vert ^2 &{} = &{} \tau _k^2\Vert \tfrac{1}{\tau _k}[ x^{k+1} - (1-\tau _k)x^k] - x\Vert ^2 \vspace{1ex}\\ &{}&{} + {~} \tau _k(1-\tau _k)\Vert x - x^k\Vert ^2. \end{array} \end{aligned}$$

We also have the following elementary relation

$$\begin{aligned} \begin{array}{lcl} \langle x^{k+1} - \hat{x}^k, \tau _k x - [\hat{x}^k - (1-\tau _k)x^k]\rangle &{}= &{} \frac{\tau _k^2}{2}\Vert \tfrac{1}{\tau _k}[\hat{x}^k - (1-\tau _k)x^k] - x\Vert ^2 + \frac{1}{2}\Vert x^{k+1} - \hat{x}^k\Vert ^2 \vspace{1ex}\\ &{}&{} - {~} \frac{\tau _k^2}{2}\Vert \tfrac{1}{\tau _k}[x^{k+1} - (1-\tau _k)x^k] - x\Vert ^2. \end{array} \end{aligned}$$

Substituting the two last expressions into (43), we obtain

$$\begin{aligned} \begin{array}{lcl} F_{\beta _k}(x^{k+1}, \dot{y}) &{} \le &{} (1 - \tau _k) F_{\beta _k}(x^k, \dot{y}) + \tau _k \left[ f(x) + \ell _{\beta _k}(x, \dot{y}) \right] \vspace{1ex}\\ &{}&{} + {~} \frac{L_k\tau _k^2}{2} \Vert \tfrac{1}{\tau _k}[\hat{x}^k - (1-\tau _k)x^k] - x\Vert ^2 \vspace{1ex}\\ &{}&{} - {~} \frac{\tau _k^2}{2}\left( L_k + \mu _f \right) \Vert \tfrac{1}{\tau _k}[x^{k+1} - (1-\tau _k)x^k] - x\Vert ^2 \vspace{1ex}\\ &{}&{} - {~} \frac{(1-\tau _k)(\mu _{g^{*}} + \beta _k)}{2}\Vert \nabla _ug_{\beta _k}(K\hat{x}^k, \dot{y}) - \nabla _ug_{\beta _k}(Kx^k, \dot{y})\Vert ^2 \vspace{1ex}\\ &{}&{} - {~} \frac{L_k}{2}\Vert x^{k+1} - \hat{x}^k\Vert ^2 + \frac{1}{2(\mu _{g^{*}}+\beta _k)}\Vert K(x^{k+1} - \hat{x}^k)\Vert ^2 \vspace{1ex}\\ &{}&{} - {~} \frac{\mu _f(1-\tau _k)\tau _k}{2}\Vert x - x^k\Vert ^2. \end{array} \end{aligned}$$

(44)

One the one hand, by (32) of Lemma 4, we have

$$\begin{aligned} F_{\beta _k}(x^k, \dot{y}) \le F_{\beta _{k-1}}(x^k, \dot{y}) + \frac{(\beta _{k-1} - \beta _k)}{2}\Vert \nabla _ug_{\beta _{k}}(Kx^k, \dot{y}) - \dot{y}\Vert ^2. \end{aligned}$$

On the other hand, by (33) of Lemma 4, we get

$$\begin{aligned} f(x) + \ell _{\beta _k}(x, \dot{y}) \le \mathcal {L}(x, y^{k+1}) - \frac{\beta _k}{2}\Vert \nabla _ug_{\beta _{k}}(K\hat{x}^k, \dot{y}) - \dot{y}\Vert ^2, \end{aligned}$$

where $\mathcal {L}(x, y^{k+1}) := f(x) + \langle Kx, y^{k+1}\rangle - g^{*}(y^{k+1})$ is the Lagrange function in (1).

Now, substituting the last two inequalities into (44), and using Lemma 6(b) with $w := \nabla _ug_{\beta _k}(K\hat{x}^k, \dot{y}) - \dot{y}$ and $z := \nabla _ug_{\beta _k}(Kx^k, \dot{y}) - \dot{y}$, we arrive at

$$\begin{aligned} \begin{array}{lcl} F_{\beta _k}(x^{k+1}, \dot{y}) &{} \le &{} (1 - \tau _k) F_{\beta _{k-1}}(x^k, \dot{y}) + \tau _k\mathcal {L}(x, y^{k+1}) + \frac{L_k\tau _k^2}{2} \Vert \tfrac{1}{\tau _k}[\hat{x}^k - (1-\tau _k)x^k] - x\Vert ^2 \vspace{1ex}\\ &{}&{} - {~} \frac{\tau _k^2}{2}\left( L_k + \mu _f \right) \Vert \tfrac{1}{\tau _k}[x^{k+1} - (1-\tau _k)x^k] - x\Vert ^2 - \frac{\mu _f(1-\tau _k)\tau _k}{2}\Vert x - x^k\Vert ^2 \vspace{1ex}\\ &{}&{} - {~} \frac{L_k}{2}\Vert x^{k+1} - \hat{x}^k\Vert ^2 + \frac{1}{2(\mu _{g^{*}}+\beta _k)}\Vert K(x^{k+1} - \hat{x}^k)\Vert ^2 \vspace{1ex}\\ &{}&{} - {~} \frac{(1-\tau _k)}{2}\left[ \tau _k\beta _k - (\beta _{k-1} - \beta _k)\right] \Vert \nabla _ug_{\beta _k}(Kx^k, \dot{y}) - \dot{y}\Vert ^2 \vspace{1ex}\\ &{}&{} - {~} \frac{\beta _k}{2}\Vert \nabla _ug_{\beta _k}(K\hat{x}^k, \dot{y}) - \dot{y} - (1-\tau _k)\left[ \nabla _ug_{\beta _k}(Kx^k, \dot{y}) - \dot{y} \right] \Vert ^2 \vspace{1ex}\\ &{}&{} - {~} \frac{(1-\tau _k)\mu _{g^{*}}}{2}\Vert \nabla _ug_{\beta _k}(K\hat{x}^k, \dot{y}) - \nabla _ug_{\beta _k}(Kx^k, \dot{y})\Vert ^2. \end{array} \end{aligned}$$

By dropping the last two nonpositive terms in the last inequality, we obtain (10). $\square $

1.2 The proof of Lemma 3: recursive estimate of the Lyapunov function

Proof

First, from the last line $\tilde{y}^{k+1} = (1-\tau _k)\tilde{y}^k + \tau _ky^{k+1}$ of (11), and the $\mu _{g^{*}}$-convexity of $g^{*}$, we have

$$\begin{aligned} \begin{array}{lcl} \mathcal {L}(x, \tilde{y}^{k+1}) &{} := &{} f(x) + \langle Kx, \tilde{y}^{k+1}\rangle - g^{*}(\tilde{y}^{k+1}) \vspace{1ex}\\ &{} \ge &{} (1-\tau _k)\mathcal {L}(x, \tilde{y}^k) + \tau _k\mathcal {L}(x, y^{k+1}) + \frac{\mu _{g^{*}}\tau _k(1-\tau _k)}{2}\Vert y^{k+1} - \tilde{y}^k\Vert ^2. \end{array} \end{aligned}$$

Hence, $\tau _k\mathcal {L}(x, y^{k+1}) \le \mathcal {L}(x, \tilde{y}^{k+1}) - (1-\tau _k)\mathcal {L}(x, \tilde{y}^k) - \frac{\mu _{g^{*}}\tau _k(1-\tau _k)}{2}\Vert y^{k+1} - \tilde{y}^k\Vert ^2$. Substituting this estimate into (10) and dropping the term $- \frac{\mu _{g^{*}}\tau _k(1-\tau _k)}{2}\Vert y^{k+1} - \tilde{y}^k\Vert ^2$, we can derive

$$\begin{aligned} \begin{array}{lcl} F_{\beta _k}(x^{k+1},\dot{y}) &{} \le &{} (1 - \tau _k) F_{\beta _{k-1}}(x^k, \dot{y}) + \mathcal {L}(x, \tilde{y}^{k+1}) - (1-\tau _k)\mathcal {L}(x, \tilde{y}^k) \vspace{1ex}\\ &{}&{} + {~} \frac{L_k\tau _k^2}{2} \big \Vert \tfrac{1}{\tau _k}[\hat{x}^k - (1-\tau _k)x^k] - x \big \Vert ^2 \vspace{1ex}\\ &{}&{} - {~} \frac{\tau _k^2}{2}\left( L_k + \mu _f\right) \big \Vert \tfrac{1}{\tau _k}[x^{k+1} - (1-\tau _k)x^k] - x \big \Vert ^2 \vspace{1ex}\\ &{}&{} - {~} \frac{L_k}{2}\Vert x^{k+1} - \hat{x}^k\Vert ^2 + \frac{1}{2(\mu _{g^{*}} + \beta _k)}\Vert K(x^{k+1} - \hat{x}^k)\Vert ^2 \vspace{1ex}\\ &{}&{} - {~} \frac{\mu _f\tau _k(1-\tau _k)}{2}\Vert x^k - x\Vert ^2. \end{array} \end{aligned}$$

(45)

Now, it is obvious to show that the condition (14) is equivalent to the condition (34) of Lemma 7. In addition, we choose $\eta _k = \frac{1}{\omega _k}$ in our update (13), where $\omega _k := \frac{\tau _{k-1}^2 + m_k\tau _k}{\tau _{k-1}(1-\tau _{k-1})}$, which is the upper bound of (35). Hence, (35) automatically holds. Using (36), we have

$$\begin{aligned} \begin{array}{lcl} \mathcal {T}_{[2]} &{}:= &{} \frac{L_k\tau _k^2}{2} \big \Vert \tfrac{1}{\tau _k}[\hat{x}^k - (1-\tau _k)x^k] - x \big \Vert ^2 - \frac{\mu _f\tau _k(1-\tau _k)}{2}\Vert x^k - x\Vert ^2 \vspace{1ex}\\ &{}\le &{} \frac{\tau _{k-1}^2}{2} (1-\tau _k)\left( L_{k-1} + \mu _f\right) \big \Vert \tfrac{1}{\tau _{k-1}}[x^{k} - (1-\tau _{k-1})x^{k-1}] - x \big \Vert ^2. \end{array} \end{aligned}$$

Moreover, $\frac{1}{2(\mu _{g^{*}} + \beta _k)}\Vert K(x^{k+1} - \hat{x}^k)\Vert ^2 \le \frac{\Vert K\Vert ^2}{2(\mu _{g^{*}} + \beta _k)}\Vert x^{k+1} - \hat{x}^k\Vert ^2 = \frac{L_k}{2}\Vert x^{k+1} - \hat{x}^k\Vert ^2$ due to the definition of $L_k$ in (13). Substituting these two estimates into (45), and utilizing the definition (12) of $\mathcal {V}_k$, we obtain (15). $\square $

Rights and permissions

Reprints and permissions

About this article

Cite this article

Tran-Dinh, Q. A unified convergence rate analysis of the accelerated smoothed gap reduction algorithm. Optim Lett 16, 1235–1257 (2022). https://doi.org/10.1007/s11590-021-01775-4

Download citation

Received: 27 November 2020
Accepted: 22 June 2021
Published: 01 July 2021
Issue Date: May 2022
DOI: https://doi.org/10.1007/s11590-021-01775-4

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A unified convergence rate analysis of the accelerated smoothed gap reduction algorithm

Abstract

Access this article

Similar content being viewed by others

An adaptive primal-dual framework for nonsmooth convex minimization

On the Sublinear Convergence Rate of Multi-block ADMM

A study of progressive hedging for stochastic integer programming

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Appendices

Appendix 1: Technical lemmas

Lemma 4

Lemma 5

Proof

Lemma 6

Lemma 7

Proof

Appendix 2: Technical proof of Lemmas 2 and 3 in Sect. 3

1.1 The proof of Lemma 2: key estimate of the primal–dual step (9)

Proof

1.2 The proof of Lemma 3: recursive estimate of the Lyapunov function

Proof

Rights and permissions

About this article

Cite this article

Keywords

Navigation

A unified convergence rate analysis of the accelerated smoothed gap reduction algorithm

Abstract

Access this article

Similar content being viewed by others

An adaptive primal-dual framework for nonsmooth convex minimization

On the Sublinear Convergence Rate of Multi-block ADMM

A study of progressive hedging for stochastic integer programming

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Appendices

Appendix 1: Technical lemmas

Lemma 4

Lemma 5

Proof

Lemma 6

Lemma 7

Proof

Appendix 2: Technical proof of Lemmas 2 and 3 in Sect. 3

1.1 The proof of Lemma 2: key estimate of the primal–dual step (9)

Proof

1.2 The proof of Lemma 3: recursive estimate of the Lyapunov function

Proof

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation