当前位置: X-MOL 学术IEEE Trans. Signal Process. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
An Improved Convergence Analysis for Decentralized Online Stochastic Non-Convex Optimization
IEEE Transactions on Signal Processing ( IF 4.6 ) Pub Date : 2021-03-01 , DOI: 10.1109/tsp.2021.3062553
Ran Xin , Usman A. Khan , Soummya Kar

In this paper, we study decentralized online stochastic non-convex optimization over a network of nodes. Integrating a technique called gradient tracking in decentralized stochastic gradient descent, we show that the resulting algorithm, GT-DSGD , enjoys certain desirable characteristics towards minimizing a sum of smooth non-convex functions. In particular, for general smooth non-convex functions, we establish non-asymptotic characterizations of GT-DSGD and derive the conditions under which it achieves network-independent performances that match the centralized minibatch SGD . In contrast, the existing results suggest that GT-DSGD is always network-dependent and is therefore strictly worse than the centralized minibatch SGD . When the global non-convex function additionally satisfies the Polyak-Łojasiewics (PL) condition, we establish the linear convergence of GT-DSGD up to a steady-state error with appropriate constant step-sizes. Moreover, under stochastic approximation step-sizes, we establish, for the first time, the optimal global sublinear convergence rate on almost every sample path, in addition to the asymptotically optimal sublinear rate in expectation. Since strongly convex functions are a special case of the functions satisfying the PL condition, our results are not only immediately applicable but also improve the currently known best convergence rates and their dependence on problem parameters.

中文翻译:

分散在线随机非凸优化的改进收敛性分析

在本文中,我们研究了节点网络上的分散式在线随机非凸优化。在分散的随机梯度下降中集成了一种称为梯度跟踪的技术,我们证明了所得的算法,GT-DSGD 对于最小化平滑的非凸函数之和,具有一定的期望特性。特别是,对于一般的光滑非凸函数,我们建立了GT-DSGD 并得出在何种条件下它可以实现与集中式微型批次相匹配的独立于网络的性能 新元 。相反,现有结果表明GT-DSGD 总是依赖于网络,因此严格比集中式minibatch更差 新元 。当全局非凸函数还满足Polyak-Łojasiewics(PL)条件时,我们建立了GT-DSGD具有适当的恒定步长的稳态误差。此外,在随机逼近步长下,除了期望值的渐近最优亚线性速率外,我们首次在几乎每个样本路径上建立了最优全局亚线性收敛速率。由于强凸函数是满足PL条件的函数的特例,因此我们的结果不仅可以立即应用,而且可以提高当前已知的最佳收敛速度及其对问题参数的依赖性。
更新日期:2021-04-02
down
wechat
bug