Abstract
In this paper, we develop a unified convergence analysis framework for the Accelerated Smoothed GAp ReDuction algorithm (ASGARD) introduced in Tran-Dinh et al. (SIAM J Optim 28(1):96–134, 2018). Unlike Tran-Dinh et al. (SIAM J Optim 28(1):96–134, 2018), the new analysis covers three settings in a single algorithm: general convexity, strong convexity, and strong convexity and smoothness. Moreover, we establish the convergence guarantees on three criteria: (i) gap function, (ii) primal objective residual, and (iii) dual objective residual. Our convergence rates are optimal (up to a constant factor) in all cases. While the convergence rate on the primal objective residual for the general convex case has been established in Tran-Dinh et al. (SIAM J Optim 28(1):96–134, 2018), we prove additional convergence rates on the gap function and the dual objective residual. The analysis for the last two cases is completely new. Our results provide a complete picture on the convergence guarantees of ASGARD. Finally, we present four different numerical experiments on a representative optimization model to verify our algorithm and compare it with the well-known Nesterov’s smoothing algorithm.
Similar content being viewed by others
References
Bauschke, H.H., Combettes, P.: Convex Analysis and Monotone Operators Theory in Hilbert Spaces, 2nd edn. Springer, Berlin (2017)
Beck, A., Teboulle, M.: A fast iterative shrinkage-thresholding algorithm for linear inverse problems. SIAM J. Imaging Sci. 2(1), 183–202 (2009)
Belloni, A., Chernozhukov, V., Wang, L.: Square-root LASSO: Pivotal recovery of sparse signals via conic programming. Biometrika 94(4), 791–806 (2011)
Boţ, R.I., Böhm, A.: Variable smoothing for convex optimization problems using stochastic gradients. J. Sci. Comput. 85(2), 1–29 (2020)
Chambolle, A., Pock, T.: A first-order primal–dual algorithm for convex problems with applications to imaging. J. Math. Imaging Vis. 40(1), 120–145 (2011)
Chambolle, A., Pock, T.: On the ergodic convergence rates of a first-order primal–dual algorithm. Math. Program. 159(1–2), 253–287 (2016)
Chen, Y., Lan, G., Ouyang, Y.: Optimal primal–dual methods for a class of saddle-point problems. SIAM J. Optim. 24(4), 1779–1814 (2014)
Condat, L.: A primal–dual splitting method for convex optimization involving Lipschitzian, proximable and linear composite terms. J. Optim. Theory Appl. 158, 460–479 (2013)
Davis, D.: Convergence rate analysis of primal–dual splitting schemes. SIAM J. Optim. 25(3), 1912–1943 (2015)
Davis, D., Yin, W.: A three-operator splitting scheme and its optimization applications. Set-Valued Var. Anal. 25(4), 829–858 (2017)
Esser, E., Zhang, X., Chan, T.: A general framework for a class of first order primal–dual algorithms for TV-minimization. SIAM J. Imaging Sci. 3(4), 1015–1046 (2010)
Goldstein, T., Esser, E., Baraniuk, R.: Adaptive primal-dual hybrid gradient methods for saddle point problems. Technical Report, pp. 1–26 (2013). arxiv: 1305.0546v1.pdf
Grant, M.: Disciplined Convex Programming. Ph.D. thesis, Stanford University (2004)
He, B.S., Yuan, X.M.: On the \({O}(1/n)\) convergence rate of the Douglas–Rachford alternating direction method. SIAM J. Numer. Anal. 50, 700–709 (2012)
Nemirovskii, A., Yudin, D.: Problem Complexity and Method Efficiency in Optimization. Wiley Interscience, London (1983)
Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course, Volume 87 of Applied Optimization. Kluwer Academic Publishers, London (2004)
Nesterov, Y.: Smooth minimization of non-smooth functions. Math. Program. 103(1), 127–152 (2005)
Nesterov, Y.: Gradient methods for minimizing composite objective function. Math. Program. 140(1), 125–161 (2013)
O’Connor, D., Vandenberghe, L.: Primal-dual decomposition by operator splitting and applications to image deblurring. SIAM J. Imaging Sci. 7(3), 1724–1754 (2014)
Ouyang, Y., Xu, Y.: Lower complexity bounds of first-order methods for convex-concave bilinear saddle-point problems. Math. Program. 185, 1–35 (2019)
Sabach, S., Teboulle, M.: Faster Lagrangian-based methods in convex optimization (2020). arXiv preprint arXiv:2010.14314
Tran-Dinh, Q., Alacaoglu, A., Fercoq, O., Cevher, V.: An adaptive primal–dual framework for nonsmooth convex minimization. Math. Program. Compt. 12, 451–491 (2020)
Tran-Dinh, Q., Fercoq, O., Cevher, V.: A smooth primal–dual optimization framework for nonsmooth composite convex minimization. SIAM J. Optim. 28(1), 96–134 (2018)
Tran-Dinh, Q., Savorgnan, C., Diehl, M.: Combining Lagrangian decomposition and excessive gap smoothing technique for solving large-scale separable convex optimization problems. Comput. Optim. Appl. 55(1), 75–111 (2013)
Tran-Dinh, Q., Zhu, Y.: Non-stationary first-order primal–dual algorithms with faster convergence rates. SIAM J. Optim. 30(4), 2866–2896 (2020)
Tseng, P.: On accelerated proximal gradient methods for convex–concave optimization. SIAM J. Optim. (2008)
Valkonen, T.: Inertial, corrected, primal–dual proximal splitting. SIAM J. Optim. 30(2), 1391–1420 (2020)
Vu, B.C.: A variable metric extension of the forward–backward–forward algorithm for monotone operators. Numer. Funct. Anal. Optim. 34(9), 1050–1065 (2013)
Zhu, Y., Liu, D., Tran-Dinh, Q.: Primal–dual algorithms for a class of nonlinear compositional convex optimization problems, pp. 1–26 (2020). arXiv preprint arXiv:2006.09263
Zou, H., Hastie, T.: Regularization and variable selection via the elastic net. J. Roy. Stat. Soc. B 67(2), 301–320 (2005)
Acknowledgements
This work is partly supported by the Office of Naval Research under Grant No. ONR-N00014-20-1-2088 (2020–2023), and the Nafosted Vietnam, Grant No. 101.01-2020.06 (2020–2022).
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendices
Appendix 1: Technical lemmas
We need the following technical lemmas for our convergence analysis in the main text.
Lemma 4
([23, Lemma 10]) Given \(\beta > 0\), \(\dot{y}\in \mathbb {R}^n\), and a proper, closed, and convex function \(g : \mathbb {R}^n \rightarrow \mathbb {R}\cup \{+\infty \}\) with its Fenchel conjugate \(g^{*}\), we define
Let \(y^{*}_{\beta }(u, \dot{y})\) be the unique solution of (29). Then, the following statements hold:
- (a):
-
\(g_{\beta }(\cdot ,\dot{y})\) is convex w.r.t. u on \(\mathrm {dom}\left( g\right) \) and \(\frac{1}{\beta + \mu _{g^{*}}}\)-smooth w.r.t. u on \(\mathrm {dom}\left( g\right) \), where \(\nabla _u{g_{\beta }}(u,\dot{y}) = \text {prox}_{g^{*}/\beta }(\dot{y} + \frac{1}{\beta }u)\). Moreover, for any \(u, \hat{u} \in \mathrm {dom}\left( g\right) \), we have
$$\begin{aligned} g_{\beta }(\hat{u},\dot{y}) + \langle \nabla {g}_{\beta }(\hat{u},\dot{y}), u - \hat{u}\rangle \le g_{\beta }(u,\dot{y}) - \frac{\beta + \mu _{g^{*}}}{2}\Vert \nabla _u{g_{\beta }}(\hat{u},\dot{y}) - \nabla _u{g_{\beta }}(u,\dot{y})\Vert ^2. \end{aligned}$$(30) - (b):
-
For any \(\beta > 0\), \(\dot{y}\in \mathbb {R}^n\), and \(u\in \mathrm {dom}\left( g\right) \), we have
$$\begin{aligned} \begin{array}{lcl} g_{\beta }(u,\dot{y}) \le g(u) \le g_{\beta }(u,\dot{y}) + \frac{\beta }{2} [ D_{g}(\dot{y})]^2, \ \text {where} \ D_{g}(\dot{y}) := \sup _{y\in \partial {g(u)}} \left\| y - \dot{y}\right\| . \end{array} \end{aligned}$$(31) - (c):
-
For \(u\in \mathrm {dom}\left( g\right) \) and \(\dot{y}\in \mathbb {R}^n\), \(g_{\beta }(u,\dot{y})\) is convex in \(\beta \), and for all \(\hat{\beta } \ge \beta > 0\), we have
$$\begin{aligned} g_{\beta }(u,\dot{y}) \le g_{\hat{\beta }}(u,\dot{y}) + \big (\tfrac{\hat{\beta } - \beta }{2}\big )\Vert \nabla _u{g_{\beta }}(u,\dot{y}) - \dot{y} \Vert ^2. \end{aligned}$$(32) - (d):
-
For any \(\beta > 0\), and \(u, \hat{u}\in \mathrm {dom}\left( g\right) \), we have
$$\begin{aligned} \begin{array}{lcl} g_{\beta }(u,\dot{y}) + \langle \nabla _u{g_{\beta }}(u, \dot{y}), \hat{u} - u\rangle\le & {} \ell _{\beta }(\hat{u}, \dot{y}) - \frac{\beta }{2}\Vert \nabla _u{g_{\beta }}(u,\dot{y}) - \dot{y}\Vert ^2, \end{array} \end{aligned}$$(33)where \(\ell _{\beta }(\hat{u}, \dot{y}) := \langle \hat{u}, \nabla _u{g_{\beta }}(u,\dot{y}) \rangle - g^{*}(\nabla _u{g_{\beta }}(u,\dot{y})) \le g(\hat{u}) - \frac{\mu _{g^{*}}}{2}\Vert \nabla _u{g_{\beta }}(u,\dot{y}) - \nabla {g}(\hat{u})\Vert ^2\) for any \(\nabla {g}(\hat{u}) \in \partial {g}(\hat{u})\).
Lemma 5
The following statements hold.
- \(\mathrm {(a)}\):
-
Let \(\left\{ \tau _k\right\} \subset (0, 1]\) be computed by \(\tau _{k+1} := \frac{\tau _k}{2}\big [ (\tau _k^2 + 4)^{1/2} - \tau _k\big ]\) for some \(\tau _0 \in (0, 1]\). Then, we have
$$\begin{aligned}&\tau _k^2 = (1-\tau _k)\tau _{k-1}^2, \quad \frac{1}{k + 1/\tau _0} \le \tau _k < \frac{2}{k + 2/\tau _0}, \\&\quad \text {and}\quad \frac{1}{1 + \tau _{k-2}} \le 1 - \tau _k \le \frac{1}{1+\tau _{k-1}}. \end{aligned}$$Moreover, we also have
$$\begin{aligned} \begin{array}{ll} &{} \varTheta _{l,k} := \displaystyle \prod _{i=l}^k(1-\tau _i) = \dfrac{\tau _k^2}{\tau _{l-1}^2} \quad \text {for}\ 0\le l\le k, \qquad \\ &{}\varTheta _{0,k} = \dfrac{(1-\tau _0)\tau _k^2}{\tau _0^2} \le \dfrac{4(1-\tau _0)}{(\tau _0k+2)^2}, \vspace{1ex}\\ \text {and}\quad &{}\dfrac{\tau _{l+1}^2}{\tau _{k+2}^2} \le \varGamma _{l,k} := \displaystyle \prod _{i=l}^k(1+\tau _i) \le \dfrac{\tau _l^2}{\tau _{k+1}^2} \quad \text {for} \ 0 \le l \le k. \end{array} \end{aligned}$$If we update \(\beta _k := \frac{\beta _{k-1}}{1+\tau _k}\) for a given \(\beta _0 > 0\), then
$$\begin{aligned} \frac{4\beta _0\tau _0^2}{\tau _1^2[\tau _0(k+1) + 2]^2} \le \frac{\beta _0\tau _{k+1}^2}{\tau _1^2} \le \beta _k = \frac{\beta _0}{\varGamma _{1,k}}\le \frac{\beta _0\tau _{k+2}^2}{\tau _2^2} \le \frac{4\beta _0\tau _0^2}{\tau _2^2[\tau _0(k+2) + 2]^2}. \end{aligned}$$ - \(\mathrm {(b)}\):
-
Let \(\left\{ \tau _k\right\} \subset (0, 1]\) be computed by solving \(\tau _k^3 + \tau _k^2 + \tau _{k-1}^2\tau _k - \tau _{k-1}^2 = 0\) for all \(k\ge 1\) and \(\tau _0 := 1\). Then, we have \(\frac{1}{k+1} \le \tau _k \le \frac{2}{k+2}\) and \(\varTheta _{1,k} := \prod _{i=1}^k(1-\tau _i) \le \frac{1}{k+1}\). Moreover, if we update \(\beta _k := \frac{\beta _{k-1}}{1+\tau _k}\), then \(\beta _k \le \frac{2\beta _0}{k+2}\).
Proof
The first two relations of (a) have been proved, e.g., in [24]. Let us prove the last inequality of (a). Since \(\frac{1}{1+\tau _{k-2}} \le 1-\tau _k\) is equivalent to \(\tau _{k-2}(1-\tau _k) \ge \tau _k\). Using \(1- \tau _k = \frac{\tau _k^2}{\tau _{k-1}^2}\), we have \(\tau _k\tau _{k-2} \ge \tau _{k-1}^2\). Utilizing \(\tau _k = \frac{\tau _{k-1}}{2}\big [(\tau _{k-1}^2 + 4)^2 - \tau _{k-1}\big ]\), this condition is equivalent to \(\tau _{k-2}^2 \ge \tau _{k-1}^2(1 + \tau _{k-2})\). However, since \(\tau _{k-1}^2 = (1-\tau _{k-1})\tau _{k-2}^2\), the last condition becomes \(1 \ge (1-\tau _{k-1})(1+\tau _{k-2})\), or equivalently, \(\tau _{k-1} \le \tau _{k-2}\), which automatically holds.
To prove \(1-\tau _k \le \frac{1}{1 + \tau _{k-1}}\), we write it as \(\tau _{k-1}(1-\tau _k) \le \tau _{k}\). Using again \(\tau _k^2 = (1-\tau _k)\tau _{k-1}^2\), the last inequality is equivalent to \(\tau _k \le \tau _{k-1}\), which automatically holds. The last statements of (a) is a consequence of \(1-\tau _k = \frac{\tau _k^2}{\tau _{k-1}^2}\) and the previous relations.
(b) We consider the function \(\varphi (\tau ) := \tau ^3 + \tau ^2 + \tau _{k-1}^2\tau - \tau _{k-1}^2\). Clearly, \(\varphi (0) = -\tau _{k-1}^2 < 0\) and \(\varphi (1) = 2 > 0\). Moreover, \(\varphi '(\tau ) = 3\tau ^2 + 2\tau + \tau _{k-1}^2 > 0\) for \(\tau \in [0, 1]\). Hence, the cubic equation \(\varphi (\tau ) = 0\) has a unique solution \(\tau _k \in (0, 1)\). Therefore, \(\{\tau _k\}_{k\ge 0}\) is well-defined.
Next, since \(\tau _k^3 + \tau _k^2 + \tau _k\tau _{k-1}^2 - \tau _{k-1}^2 = 0\) is equivalent to \(\tau _{k-1}^2(1-\tau _k) = \tau _k^2(1+\tau _k)\), we have \(\tau _{k-1}^2(1-\tau _k) = \tau _k^2(1+\tau _k) \le \frac{\tau _k^2}{1-\tau _k}\). This inequality becomes \(\tau _k \ge \frac{\tau _{k-1}}{1 + \tau _{k-1}}\). By induction and \(\tau _0 = 1\), we can easily show that \(\tau _k \ge \frac{1}{k+1}\). On the other hand, \(\tau _{k-1}^2(1-\tau _k) = \tau _k^2(1+\tau _k) \ge \tau _k^2\). From this inequality, with a similar argument as in the proof of the statement (a), we can also easily show that \(\tau _k \le \frac{2}{k+2}\). Hence, we have \(\frac{1}{k+1} \le \tau _k \le \frac{2}{k+2}\) for all \(k\ge 0\).
Finally, since \(\tau _k \ge \frac{1}{k+1}\), we have \(\prod _{i=1}^k(1-\tau _i) \le \prod _{i=1}^k\left( 1 - \frac{1}{i+1}\right) = \frac{1}{k+1}\). Alternatively, \(\prod _{i=1}^k(1+\tau _i) \ge \prod _{i=1}^k\left( 1 + \frac{1}{i+1}\right) = \frac{k+2}{2}\). However, since \(\beta _k = \frac{\beta _{k-1}}{1+\tau _k}\), we have \(\beta _k = \beta _0\prod _{i=1}^k\frac{1}{1+\tau _i} \le \frac{2\beta _0}{k+2}\). \(\square \)
Lemma 6
([29, Lemma 4] and [23]) The following statements hold.
- \(\mathrm {(a)}\):
-
For any \(u, v, w\in \mathbb {R}^p\) and \(t_1, t_2 \in \mathbb {R}\) such that \(t_1 + t_2 \ne 0\), we have
$$\begin{aligned} t_1\Vert u - w\Vert ^2 + t_2\Vert v - w\Vert ^2 = (t_1 + t_2)\Vert w - \tfrac{1}{t_1+t_2}(t_1u + t_2v)\Vert ^2 + \tfrac{t_1t_2}{t_1+t_2}\Vert u-v\Vert ^2. \end{aligned}$$ - \(\mathrm {(b)}\):
-
For any \(\tau \in (0, 1)\), \(\hat{\beta }, \beta > 0\), \(w, z\in \mathbb {R}^p\), we have
$$\begin{aligned} \begin{array}{lcl} &{}&{}\beta (1-\tau ) \Vert w - z\Vert ^2 + \beta \tau \Vert w\Vert ^2 - (1-\tau )(\hat{\beta } - \beta )\Vert z\Vert ^2 = \beta \Vert w - (1-\tau )z\Vert ^2 \vspace{1ex}\\ &{}&{} \quad + {~} (1-\tau )\big [\tau \beta - (\hat{\beta } - \beta ) \big ]\Vert z\Vert ^2. \end{array} \end{aligned}$$
The following lemma is a key step to address the strongly convex case of f in (1).
Lemma 7
Given \(L_k > 0\), \(\mu _f > 0\), and \(\tau _k \in (0, 1)\), let \(m_k := \frac{L_k + \mu _f}{L_{k-1} + \mu _f}\) and \(a_k := \frac{L_k}{L_{k-1} + \mu _f}\). Assume that the following two conditions hold:
Let \(\left\{ x^k\right\} \) be a given sequence in \(\mathbb {R}^p\). We define \(\hat{x}^k := x^k + \frac{1}{\omega _k}(x^k - x^{k-1})\), where \(\omega _k\) is chosen such that
Then, \(\omega _k\) is well-defined, and for any \(x \in \mathbb {R}^p\), we have
Proof
Firstly, from the definition \(\hat{x}^k := x^k + \frac{1}{\omega _k}(x^k - x^{k-1})\) of \(\hat{x}^k\), we have \(\omega _k(\hat{x}^k - x^k) = x^k - x^{k-1}\). Hence, we can show that
Alternatively, we also have
Utilizing the two last expressions, (36) can be rewritten equivalently to
Now, let us denote
Then, (36) is equivalent to
Secondly, we need to guarantee that \(c_1 \ge 0\). This condition holds if we choose \(\omega _k\) such that
Thirdly, we also need to guarantee \(c_2 \ge c_1\), which is equivalent to
This condition holds if
Alternatively, we also need to guarantee \(c_3 \ge c_1\), which is equivalent to
This condition holds if
Combining (38), (39), and (40), we obtain
which is exactly (35). Here, under the condition (34), the left-hand side of the last expression is less than or equal to the right-hand side. Therefore, \(\omega _k\) is well-defined.
Finally, under the choice of \(\omega _k\) as in (35), we have \(c_2 \ge c_1 \ge 0\) and \(c_3\ge c_1 \ge 0\). Hence, (37) holds, which is also equivalent to (36). \(\square \)
Appendix 2: Technical proof of Lemmas 2 and 3 in Sect. 3
This section provides the full proof of Lemmas 2 and 3 in the main text.
1.1 The proof of Lemma 2: key estimate of the primal–dual step (9)
Proof
From the first line of (9) and Lemma 4(a), we have \(\nabla _ug_{\beta _k}(K\hat{x}^k, \dot{y}) = K^{\top }y^{k+1}\). Now, from the second line of (9), we also have
Combining this inclusion and the \(\mu _f\)-convexity of f, for any \(x\in \mathrm {dom}\left( f\right) \), we get
Since \(g_{\beta }(\cdot , \dot{y})\) is \(\frac{1}{\beta + \mu _{g^{*}}}\)-smooth by Lemma 4(a), for any \(x\in \mathrm {dom}\left( f\right) \), we have
Now, combining the last two estimates, we get
Using Lemma 4(a) again, we have
Substituting \(x := x^k\) into (41), and multiplying the result by \(1-\tau _k\) and adding the result to (41) after multiplying it by \(\tau _k\), then using (42), we can derive
From Lemma 6(a), we can easily show that
We also have the following elementary relation
Substituting the two last expressions into (43), we obtain
One the one hand, by (32) of Lemma 4, we have
On the other hand, by (33) of Lemma 4, we get
where \(\mathcal {L}(x, y^{k+1}) := f(x) + \langle Kx, y^{k+1}\rangle - g^{*}(y^{k+1})\) is the Lagrange function in (1).
Now, substituting the last two inequalities into (44), and using Lemma 6(b) with \(w := \nabla _ug_{\beta _k}(K\hat{x}^k, \dot{y}) - \dot{y}\) and \(z := \nabla _ug_{\beta _k}(Kx^k, \dot{y}) - \dot{y}\), we arrive at
By dropping the last two nonpositive terms in the last inequality, we obtain (10). \(\square \)
1.2 The proof of Lemma 3: recursive estimate of the Lyapunov function
Proof
First, from the last line \(\tilde{y}^{k+1} = (1-\tau _k)\tilde{y}^k + \tau _ky^{k+1}\) of (11), and the \(\mu _{g^{*}}\)-convexity of \(g^{*}\), we have
Hence, \(\tau _k\mathcal {L}(x, y^{k+1}) \le \mathcal {L}(x, \tilde{y}^{k+1}) - (1-\tau _k)\mathcal {L}(x, \tilde{y}^k) - \frac{\mu _{g^{*}}\tau _k(1-\tau _k)}{2}\Vert y^{k+1} - \tilde{y}^k\Vert ^2\). Substituting this estimate into (10) and dropping the term \(- \frac{\mu _{g^{*}}\tau _k(1-\tau _k)}{2}\Vert y^{k+1} - \tilde{y}^k\Vert ^2\), we can derive
Now, it is obvious to show that the condition (14) is equivalent to the condition (34) of Lemma 7. In addition, we choose \(\eta _k = \frac{1}{\omega _k}\) in our update (13), where \(\omega _k := \frac{\tau _{k-1}^2 + m_k\tau _k}{\tau _{k-1}(1-\tau _{k-1})}\), which is the upper bound of (35). Hence, (35) automatically holds. Using (36), we have
Moreover, \(\frac{1}{2(\mu _{g^{*}} + \beta _k)}\Vert K(x^{k+1} - \hat{x}^k)\Vert ^2 \le \frac{\Vert K\Vert ^2}{2(\mu _{g^{*}} + \beta _k)}\Vert x^{k+1} - \hat{x}^k\Vert ^2 = \frac{L_k}{2}\Vert x^{k+1} - \hat{x}^k\Vert ^2\) due to the definition of \(L_k\) in (13). Substituting these two estimates into (45), and utilizing the definition (12) of \(\mathcal {V}_k\), we obtain (15). \(\square \)
Rights and permissions
About this article
Cite this article
Tran-Dinh, Q. A unified convergence rate analysis of the accelerated smoothed gap reduction algorithm. Optim Lett 16, 1235–1257 (2022). https://doi.org/10.1007/s11590-021-01775-4
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11590-021-01775-4