Skip to main content
Log in

Improving the Convergence of Distributed Gradient Descent via Inexact Average Consensus

  • Published:
Journal of Optimization Theory and Applications Aims and scope Submit manuscript

Abstract

It is observed that the inexact convergence of the well-known distributed gradient descent algorithm can be caused by inaccuracy of consensus procedures. Motivated by this, to achieve the improved convergence, we ensure the sufficiently accurate consensus via approximate consensus steps. The accuracy is controlled by a predefined sequence of consensus error bounds. It is shown that one can achieve exact convergence when the sequence decays to zero; furthermore, a linear convergence rate can be obtained when the sequence decays to zero linearly. To implement the approximate consensus step with given error bounds, an inexact average consensus scheme is proposed in a distributed manner. Due to the flexibility of choices of both consensus error bounds and consensus schemes, the proposed framework offers the potential to balance the communication and computation in distributed optimization.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4

Similar content being viewed by others

References

  1. Bazerque, J.A., Giannakis, G.B.: Distributed spectrum sensing for cognitive radio networks by exploiting sparsity. IEEE Trans. Signal Process. 58(3), 1847–1862 (2010)

    Article  MathSciNet  Google Scholar 

  2. Schizas, I.D., Giannakis, G.B., Roumeliotis, S.I., Ribeiro, A.: Consensus in ad hoc WSNs with noisy links—part II: distributed estimation and smoothing of random signals. IEEE Trans. Signal Process. 56(4), 1650–1666 (2008)

    Article  MathSciNet  Google Scholar 

  3. Cao, Y., Yu, W., Ren, W., Chen, G.: An overview of recent progress in the study of distributed multi-agent coordination. IEEE Trans. Ind. Inform. 9(1), 427–438 (2013)

    Article  Google Scholar 

  4. Bullo, F., Cortes, J., Martinez, S.: Distributed Control of Robotic Networks: A Mathematical Approach to Motion Coordination Algorithms, vol. 27. Princeton University Press, Princeton (2009)

    Book  Google Scholar 

  5. Cevher, V., Becker, S., Schmidt, M.: Convex optimization for big data: scalable, randomized, and parallel algorithms for big data analytics. IEEE Signal Process. Mag. 31(5), 32–43 (2014)

    Article  Google Scholar 

  6. Bekkerman, R., Bilenko, M., Langford, J.: Scaling Up Machine Learning: Parallel and Distributed Approaches. Cambridge University Press, Cambridge (2011)

    Book  Google Scholar 

  7. Nedic, A., Ozdaglar, A.: Distributed subgradient methods for multi-agent optimization. IEEE Trans. Autom. Control 1(54), 48–61 (2009)

    Article  MathSciNet  Google Scholar 

  8. Tsianos, K.I., Rabbat, M.G.: Distributed strongly convex optimization. In: 2012 50th Annual Allerton Conference on Communication, Control, and Computing (Allerton), pp. 593–600. IEEE (2012)

  9. Nedic, A., Olshevsky, A., Shi, W.: Achieving geometric convergence for distributed optimization over time-varying graphs. SIAM J. Optim. 27(4), 2597–2633 (2017)

    Article  MathSciNet  Google Scholar 

  10. Shi, W., Ling, Q., Wu, G., Yin, W.: Extra: an exact first-order algorithm for decentralized consensus optimization. SIAM J. Optim. 25(2), 944–966 (2015)

    Article  MathSciNet  Google Scholar 

  11. Jakovetić, D.: A unification and generalization of exact distributed first-order methods. IEEE Trans. Signal Inf. Process. Netw. 5(1), 31–46 (2019)

    Article  MathSciNet  Google Scholar 

  12. Shi, W., Ling, Q., Yuan, K., Wu, G., Yin, W.: On the linear convergence of the ADMM in decentralized consensus optimization. IEEE Trans. Signal Process. 62(7), 1750–1761 (2014)

    Article  MathSciNet  Google Scholar 

  13. Mokhtari, A., Shi, W., Ling, Q., Ribeiro, A.: A decentralized second-order method with exact linear convergence rate for consensus optimization. IEEE Trans. Signal Inf. Process. Netw. 2(4), 507–522 (2016)

    Article  MathSciNet  Google Scholar 

  14. Chen, A.I., Ozdaglar, A.: A fast distributed proximal-gradient method. In: 2012 50th Annual Allerton Conference on Communication, Control, and Computing (Allerton), pp. 601–608. IEEE (2012)

  15. Jakovetić, D., Xavier, J., Moura, J.M.: Fast distributed gradient methods. IEEE Trans. Autom. Control 59(5), 1131–1146 (2014)

    Article  MathSciNet  Google Scholar 

  16. Berahas, A.S., Bollapragada, R., Keskar, N.S., Wei, E.: Balancing communication and computation in distributed optimization. IEEE Trans. Autom. Control 64(8), 3141–3155 (2018)

    Article  MathSciNet  Google Scholar 

  17. Manitara, N.E., Hadjicostis, C.N.: Distributed stopping for average consensus in undirected graphs via event-triggered strategies. Automatica 70, 121–127 (2016)

    Article  MathSciNet  Google Scholar 

  18. Manitara, N.E., Hadjicostis, C.N.: Distributed stopping for average consensus in digraphs. IEEE Trans. Control of Netw. Syst. 5(3), 957–967 (2018)

    Article  MathSciNet  Google Scholar 

  19. Yadav, V., Salapaka, M.V.: Distributed protocol for determining when averaging consensus is reached. In: 45th Annual Allerton Conference, pp. 715–720 (2007)

  20. Sundaram, S., Hadjicostis, C.N.: Finite-time distributed consensus in graphs with time-invariant topologies. In: 2007 American Control Conference, pp. 711–716. IEEE (2007)

  21. Mou, S., Morse, A.S.: Finite-time distributed averaging. In: 2014 American Control Conference, pp. 5260–5263. IEEE (2014)

  22. Nesterov, Y.: Introductory lectures on convex programming Volume I: basic course. Lect. Notes 3(4), 5 (1998)

    Google Scholar 

  23. Nedic, A., Ozdaglar, A., Parrilo, P.A.: Constrained consensus and optimization in multi-agent networks. IEEE Trans. Autom. Control 55(4), 922–938 (2010)

    Article  MathSciNet  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Bin Du.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix: Proof of Theorem 3.1

Appendix: Proof of Theorem 3.1

We begin the proof by introducing the following additional notations. Recalling that \({\mathbf {x}}^{k+1/2}\) represents the intermediate result obtained by a pure gradient step, as shown in (5), we denote the exact average of \({\mathbf {x}}^k\) and \({\mathbf {x}}^{k+1/2}\) as \(\bar{{\mathbf {x}}}^k\) and \(\bar{{\mathbf {x}}}^{k+1/2}\), respectively, i.e.,

$$\begin{aligned} \bar{{\mathbf {x}}}^k&= \frac{1}{n}\sum _{i=1}^n {\mathbf {x}}_i^k\quad \text {and}\quad \bar{{\mathbf {x}}}^{k+1/2} = \frac{1}{n}\sum _{i=1}^n {\mathbf {x}}_i^{k+1/2}. \end{aligned}$$
(25)

Moreover, we use \(\bar{{\mathbf {g}}}^k\) to denote the global gradient at the \(\bar{{\mathbf {x}}}^k\) and use \({\mathbf {g}}^k\) to denote the average of local gradient at \({\mathbf {x}}^k\), i.e.,

$$\begin{aligned} \bar{{\mathbf {g}}}^k = \frac{1}{n}\sum _{i=1}^n \nabla {f}_i(\bar{{\mathbf {x}}}^k)\quad \text {and}\quad {\mathbf {g}}^k&= \frac{1}{n}\sum _{i=1}^n \nabla {f}_i({\mathbf {x}}_i^k). \end{aligned}$$
(26)

Next, we show by the following Lemma A.1 that, once the \(\varepsilon \)-inexact average consensus is guaranteed by \({\mathcal {C}}^{\varepsilon ^k}\), the distance \(\Vert {{\mathbf {x}}}_i^{k+1} - \bar{{\mathbf {x}}}^{k+1}\Vert \) can be bounded with the error \(\varepsilon ^k\).

Lemma A.1

Given that \({\mathbf {x}}^{k+1}\) is the output of \({\mathcal {C}}^{\varepsilon ^k}({\mathbf {x}}^{k+1/2})\) and thus \(\Vert {\mathbf {x}}_i^{k+1} - \bar{{\mathbf {x}}}^{k+1/2}\Vert \le \varepsilon ^{k}\) is guaranteed for \(\forall i\in {\mathcal {N}}\), then one can have,

$$\begin{aligned} \Vert {{\mathbf {x}}}_i^{k+1} - \bar{{\mathbf {x}}}^{k+1}\Vert \le 2\varepsilon ^{k}, \quad \forall i \in {\mathcal {N}}. \end{aligned}$$
(27)

Proof

By the definition of \(\bar{{\mathbf {x}}}^{k+1}\), it holds that

$$\begin{aligned} \begin{aligned} \Vert \bar{{\mathbf {x}}}^{k+1} - \bar{{\mathbf {x}}}^{k+1/2}\Vert&= \Vert \frac{1}{n}\left( \sum _{i=1}^n{\mathbf {x}}_i^{k+1} - \bar{{\mathbf {x}}}^{k+1/2}\right) \Vert \\&\le \frac{1}{n}\sum _{i=1}^n\Vert {\mathbf {x}}_i^{k+1} - \bar{{\mathbf {x}}}^{k+1/2}\Vert \\&\le \varepsilon ^{k}. \end{aligned} \end{aligned}$$
(28)

Therefore, for \(\forall i \in {\mathcal {N}}\), one can have,

$$\begin{aligned} \begin{aligned} \Vert {{\mathbf {x}}}_i^{k+1} - \bar{{\mathbf {x}}}^{k+1}\Vert \le \Vert {{\mathbf {x}}}_i^{k+1} - \bar{{\mathbf {x}}}^{k+1/2} \Vert +\Vert \bar{{\mathbf {x}}}^{k+1/2}- \bar{{\mathbf {x}}}^{k+1}\Vert \le 2\varepsilon ^{k}, \end{aligned} \end{aligned}$$
(29)

\(\square \)

With the help of Lemma A.1, we next bound the distance between \(\bar{{\mathbf {x}}}^k\) and the optimizer \({\mathbf {x}}^\star \). Before doing that, we will need a few supporting lemmas.

Lemma A.2

Suppose that Assumption 1 holds, then one can have

$$\begin{aligned} \Vert {\mathbf {g}}^k - \bar{{\mathbf {g}}}^k\Vert \le c_2 \varepsilon ^{k-1}, \end{aligned}$$
(30)

where \(c_2 = 2L\).

Proof

By the definitions of \({\mathbf {g}}^k\) and \(\bar{{\mathbf {g}}}^k\), as shown in (26), it holds that

$$\begin{aligned} \begin{aligned} \Vert {\mathbf {g}}^k - \bar{{\mathbf {g}}}^k\Vert&\le \frac{1}{n}\sum _{i=1}^n\Vert \nabla f_i({\mathbf {x}}_i^k) - \nabla f_i(\bar{{\mathbf {x}}}^k) \Vert \\&\overset{(A.2.\mathrm {a})}{\le } \frac{1}{n}\sum _{i=1}^n L_i\Vert {\mathbf {x}}_i^k - \bar{{\mathbf {x}}}^k\Vert \\&\overset{(A.2.\mathrm {b})}{\le } c_2 \varepsilon ^{k-1}, \end{aligned} \end{aligned}$$
(31)

where (A.2.a) is due to Assumption 1 and (A.2.b) follows from Lemma A.1. \(\square \)

Lemma A.3

Suppose that Assumptions 1 and 2 hold, then for \(\forall {\mathbf {x}}, {\mathbf {y}} \in {\mathbb {R}}^p\), we can have,

$$\begin{aligned} \begin{aligned} (\nabla F({\mathbf {x}}) - \nabla F({\mathbf {y}}))^T ({\mathbf {x}}-{\mathbf {y}}) \ge \frac{1}{\mu +L}\Vert \nabla F({\mathbf {x}}) - \nabla F({\mathbf {y}})\Vert ^2 + \frac{\mu L}{\mu +L}\Vert {\mathbf {x}}-{\mathbf {y}}\Vert ^2. \end{aligned} \end{aligned}$$
(32)

Proof

The proof can be found in Theorem 2.1.11 in [22] and thus omitted. \(\square \)

Now, we bound the distance \(\Vert \bar{{\mathbf {x}}}^{k+1}-{\mathbf {x}}^\star \Vert \) by the following lemma.

Lemma A.4

Suppose that Assumptions 1 and 2 hold, and let the constant step size satisfy \(\alpha \le 2/(\mu + L)\), then one can have

$$\begin{aligned} \Vert \bar{{\mathbf {x}}}^{k+1} - {\mathbf {x}}^\star \Vert \le c_3\Vert \bar{{\mathbf {x}}}^{k} - {\mathbf {x}}^\star \Vert + c_4 \varepsilon ^{k-1}, \end{aligned}$$
(33)

where \(c_3 = \sqrt{1-2\alpha \mu L/(\mu +L)} < 1\) and \(c_4 = \alpha c_2 +1\).

Proof

Consider,

$$\begin{aligned} \begin{aligned} \Vert \bar{{\mathbf {x}}}^k - {\mathbf {x}}^\star - \alpha \bar{{\mathbf {g}}}^k\Vert ^2&= \Vert \bar{{\mathbf {x}}}^k - {\mathbf {x}}^\star \Vert ^2 + \alpha ^2 \Vert \bar{{\mathbf {g}}}^k\Vert ^2 -2\alpha (\bar{{\mathbf {x}}}^k - {\mathbf {x}}^\star )^T\bar{{\mathbf {g}}}^k\\&\overset{(A.4.\mathrm {a})}{\le } \Vert \bar{{\mathbf {x}}}^k - {\mathbf {x}}^\star \Vert ^2 + \alpha ^2 \Vert \bar{{\mathbf {g}}}^k\Vert ^2 - 2\alpha \Big (\frac{1}{\mu + L}\Vert \bar{{\mathbf {g}}}^k\Vert ^2 + \frac{\mu L}{\mu +L}\Vert \bar{{\mathbf {x}}}^k - {\mathbf {x}}^\star \Vert ^2\Big )\\&= (1-\frac{2\alpha \mu L}{\mu +L})\Vert \bar{{\mathbf {x}}}^k - {\mathbf {x}}^\star \Vert ^2 + \alpha (\alpha - \frac{2}{\mu + L})\Vert \bar{{\mathbf {g}}}^k\Vert ^2\\& \overset{(A.4.\mathrm {b})}{\le } c_3^2\Vert \bar{{\mathbf {x}}}^k - {\mathbf {x}}^\star \Vert ^2, \end{aligned} \end{aligned}$$
(34)

where (A.4.a) follows from Lemma A.3 and (A.4.b) is due to the assumption \(\alpha \le 2/(\mu + L)\). Therefore, it holds that

$$\begin{aligned} \begin{aligned} \Vert \bar{{\mathbf {x}}}^{k+1} - {\mathbf {x}}^\star \Vert&= \Vert \bar{{\mathbf {x}}}^{k+1} - \bar{{\mathbf {x}}}^{k+1/2} + \bar{{\mathbf {x}}}^{k+1/2} - {\mathbf {x}}^\star \Vert \\&\overset{(A.4.\mathrm {c})}{\le } \Vert \bar{{\mathbf {x}}}^{k+1} - \bar{{\mathbf {x}}}^{k+1/2}\Vert + \Vert \bar{{\mathbf {x}}}^{k} - {\mathbf {x}}^\star - \alpha {\mathbf {g}}^k\Vert \\&\overset{(A.4.\mathrm {d})}{\le } \varepsilon ^{k} + \Vert \bar{{\mathbf {x}}}^{k} - {\mathbf {x}}^\star - \alpha \bar{{\mathbf {g}}}^k\Vert + \alpha \Vert {{\mathbf {g}}}^k-\bar{{\mathbf {g}}}^k\Vert \\&\overset{(A.4.\mathrm {e})}{\le } \varepsilon ^{k} + c_3 \Vert \bar{{\mathbf {x}}}^k - {\mathbf {x}}^\star \Vert + \alpha c_2 \varepsilon ^{k-1}\\&\overset{(A.4.\mathrm {f})}{\le } c_4 \varepsilon ^{k-1} + c_3 \Vert \bar{{\mathbf {x}}}^k - {\mathbf {x}}^\star \Vert , \end{aligned} \end{aligned}$$
(35)

where (A.4.c) is due to the fact that \(\bar{{\mathbf {x}}}^{k+1/2} = \bar{{\mathbf {x}}}^{k} - \alpha {\mathbf {g}}^k\), (A.4.d) follows from Lemma A.1, (A.4.e) is due to (34) and Lemma A.2, and (A.4.f) is due to the assumption \(\varepsilon ^{k-1} \ge \varepsilon ^{k}\). \(\square \)

Here, we introduce the last supporting lemma, which states the convergence of product of two decaying sequences.

Lemma A.5

Given a constant \(0<\lambda <1\) and a positive scalar sequence \(\{\beta ^k\}_{k\in {\mathbb {N}}_+}\) which has \(\lim _{k \rightarrow \infty } \beta ^k = 0\), then one can have,

$$\begin{aligned} \lim _{k \rightarrow \infty } \sum _{t=0}^{k}(\lambda )^{k-t}\beta ^t = 0. \end{aligned}$$
(36)

Proof

The proof can be found in Lemma 7 in [23], and thus omitted. \(\square \)

With the help of all the above lemmas, we are now ready to prove Theorem 3.1. By the result obtained from Lemma A.4, it holds that

$$\begin{aligned} \begin{aligned} \Vert \bar{{\mathbf {x}}}^{k+1} - {\mathbf {x}}^\star \Vert&\le c_4 \varepsilon ^{k-1} + c_3\Vert \bar{{\mathbf {x}}}^{k} - {\mathbf {x}}^\star \Vert \\&\le (c_3)^{k}\Vert \bar{{\mathbf {x}}}^{1} - {\mathbf {x}}^\star \Vert + c_4\sum _{t=0}^{k-1} (c_3)^{k-1-t}\varepsilon ^t. \end{aligned} \end{aligned}$$
(37)

According to Lemma A.5 and the fact that \(c_3 <1\), as shown in Lemma A.4, we can have \(\lim _{k \rightarrow \infty } \Vert \bar{{\mathbf {x}}}^{k+1} - {\mathbf {x}}^\star \Vert = 0\). Furthermore, it holds that

$$\begin{aligned} \begin{aligned} \lim _{k \rightarrow \infty }\Vert {\mathbf {x}}_i^{k+1} - {\mathbf {x}}^\star \Vert&\le \lim _{k \rightarrow \infty }\Vert {\mathbf {x}}_i^{k+1}-\bar{{\mathbf {x}}}^{k+1}\Vert + \lim _{k \rightarrow \infty }\Vert \bar{{\mathbf {x}}}^{k+1}-{\mathbf {x}}^\star \Vert \\&\overset{(6.\mathrm {a})}{\le } \lim _{k \rightarrow \infty }2 \varepsilon ^{k} + \lim _{k \rightarrow \infty }(c_3)^{k}\Vert \bar{{\mathbf {x}}}^{1} - {\mathbf {x}}^\star \Vert + \lim _{k \rightarrow \infty }c_4\sum _{t=0}^{k-1} (c_3)^{k-1-t}\varepsilon ^t\\&\overset{(6.\mathrm {b})}{=} 0, \end{aligned} \end{aligned}$$
(38)

where (6.a) is due to (37) and Lemma A.1, and (6.b) follows from the fact \(c_3 < 1\) and Lemma A.5. Therefore, the statement 1) in Theorem 3.1 is proved.

At last, we consider statement 2) where the sequence of consensus error bounds decays linearly, i.e., there exist constants \(c_0 > 0\) and \(0< \rho _0 <1\) such that \(\varepsilon ^k \le c_0 (\rho _0)^k\). Following the same path that we proved statement 1), let us first show that the distance \(\Vert \bar{{\mathbf {x}}}^{k+1}-{\mathbf {x}}^\star \Vert \) converges to zero at a linear rate, i.e., there exist constants \(c' > 0\) and \(0< \rho ' = \max \{\rho _0, c_3 + c_4c_0/(c'\rho _0)\} <1\) such that

$$\begin{aligned} \Vert \bar{{\mathbf {x}}}^{k+1} - {\mathbf {x}}^\star \Vert \le c' (\rho ')^{k+1}. \end{aligned}$$
(39)

Recall inequality (33) obtained by Lemma A.4, we prove the statement by induction. First, (39) apparently holds when \(k=0\). Suppose that (39) holds at the kth iteration, we now consider the \((k+1)\)th iteration and have,

$$\begin{aligned} \begin{aligned} \Vert \bar{{\mathbf {x}}}^{k+1} - {\mathbf {x}}^\star \Vert&\le c_4 \varepsilon ^{k-1} + c_3\Vert \bar{{\mathbf {x}}}^{k} - {\mathbf {x}}^\star \Vert \\&\le c_4c_0 \cdot (\rho _0)^{k-1} + c_3c'\cdot (\rho ')^{k}\\&= c' (\rho ')^k \Big (c_3 + \frac{c_4c_0}{c'\rho _0}(\frac{\rho _0}{\rho '})^k\Big )\\& \overset{(7.\mathrm {a})}{\le } c'(\rho ')^{k+1}, \end{aligned} \end{aligned}$$
(40)

where (7.a) is due to the fact \(\rho ' = \max \{\rho _0, c_3 + c_4c_0/(c'\rho _0)\}\). Now, following the same procedure as shown in (38), there exist constants \(c_1 = \max \{2c_0/\rho _0, c'\}\) and \(\rho _1 = \max \{\rho _0, \rho '\}\) such that

$$\begin{aligned} \begin{aligned} \Vert {\mathbf {x}}_i^{k+1} - {\mathbf {x}}^\star \Vert&\le \Vert {\mathbf {x}}_i^{k+1}-\bar{{\mathbf {x}}}^{k+1}\Vert + \Vert \bar{{\mathbf {x}}}^{k+1}-{\mathbf {x}}^\star \Vert \\& \overset{(8.\mathrm {a})}{\le } 2c_0\cdot (\rho _0)^{k} + c'\cdot (\rho ')^{k+1}\\&\le c_1\cdot (\rho _1)^{k+1}, \end{aligned} \end{aligned}$$
(41)

where (8.a) follows from Lemma A.1. Therefore, the proof is completed. \(\square \)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Du, B., Zhou, J. & Sun, D. Improving the Convergence of Distributed Gradient Descent via Inexact Average Consensus. J Optim Theory Appl 185, 504–521 (2020). https://doi.org/10.1007/s10957-020-01648-3

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10957-020-01648-3

Keywords

Mathematics Subject Classification

Navigation