当前位置: X-MOL 学术IEEE Trans. Signal Process. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
A Unified and Refined Convergence Analysis for Non-Convex Decentralized Learning
IEEE Transactions on Signal Processing ( IF 4.6 ) Pub Date : 6-21-2022 , DOI: 10.1109/tsp.2022.3184770
Sulaiman A. Alghunaim 1 , Kun Yuan 2
Affiliation  

We study the consensus decentralized optimization problem where the objective function is the average of nn agents private non-convex cost functions; moreover, the agents can only communicate to their neighbors on a given network topology. The stochastic learning setting is considered in this paper where each agent can only access a noisy estimate of its gradient. Many decentralized methods can solve such problem including EXTRA, Exact-Diffusion/D2^{2}, and gradient-tracking. Unlike the famed Dsgd algorithm, these methods have been shown to be robust to the heterogeneity across the local cost functions. However, the established convergence rates for these methods indicate that their sensitivity to the network topology is worse than Dsgd. Such theoretical results imply that these methods can perform much worse than Dsgd over sparse networks, which, however, contradicts empirical experiments where Dsgd is observed to be more sensitive to the network topology. In this work, we study a general stochastic unified decentralized algorithm (SUDA) that includes the above methods as special cases. We establish the convergence of SUDA under both non-convex and the Polyak-Łojasiewicz condition settings. Our results provide improved network topology dependent bounds for these methods (such as Exact-Diffusion/D2^{2} and gradient-tracking) compared with existing literature. Moreover, our results show that these methods are often less sensitive to the network topology compared to Dsgd, which agrees with numerical experiments.

中文翻译:


非凸分散学习的统一精细收敛分析



我们研究共识分散优化问题,其中目标函数是 nn 个代理私有非凸成本函数的平均值;此外,代理只能与给定网络拓扑上的邻居进行通信。本文考虑随机学习设置,其中每个代理只能访问其梯度的噪声估计。许多去中心化方法可以解决这个问题,包括 EXTRA、Exact-Diffusion/D2^{2} 和梯度跟踪。与著名的 Dsgd 算法不同,这些方法已被证明对局部成本函数的异质性具有鲁棒性。然而,这些方法的收敛速度表明它们对网络拓扑的敏感性比 Dsgd 更差。这样的理论结果意味着这些方法在稀疏网络上的性能可能比 Dsgd 差得多,但这与观察到 Dsgd 对网络拓扑更敏感的经验实验相矛盾。在这项工作中,我们研究了一种通用随机统一分散算法(SUDA),其中包括上述方法作为特例。我们在非凸和 Polyak-Łojasiewicz 条件设置下建立了 SUDA 的收敛性。与现有文献相比,我们的结果为这些方法(例如 Exact-Diffusion/D2^{2} 和梯度跟踪)提供了改进的网络拓扑相关边界。此外,我们的结果表明,与 Dsgd 相比,这些方法通常对网络拓扑不太敏感,这与数值实验一致。
更新日期:2024-08-26
down
wechat
bug