当前位置: X-MOL 学术arXiv.cs.DC › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
A Unified Theory of Decentralized SGD with Changing Topology and Local Updates
arXiv - CS - Distributed, Parallel, and Cluster Computing Pub Date : 2020-03-23 , DOI: arxiv-2003.10422
Anastasia Koloskova, Nicolas Loizou, Sadra Boreiri, Martin Jaggi, Sebastian U. Stich

Decentralized stochastic optimization methods have gained a lot of attention recently, mainly because of their cheap per iteration cost, data locality, and their communication-efficiency. In this paper we introduce a unified convergence analysis that covers a large variety of decentralized SGD methods which so far have required different intuitions, have different applications, and which have been developed separately in various communities. Our algorithmic framework covers local SGD updates and synchronous and pairwise gossip updates on adaptive network topology. We derive universal convergence rates for smooth (convex and non-convex) problems and the rates interpolate between the heterogeneous (non-identically distributed data) and iid-data settings, recovering linear convergence rates in many special cases, for instance for over-parametrized models. Our proofs rely on weak assumptions (typically improving over prior work in several aspects) and recover (and improve) the best known complexity results for a host of important scenarios, such as for instance coorperative SGD and federated averaging (local SGD).

中文翻译:

具有变化拓扑和局部更新的去中心化 SGD 的统一理论

去中心化随机优化方法最近获得了很多关注,主要是因为它们的每次迭代成本低廉、数据局部性和通信效率高。在本文中,我们介绍了一种统一的收敛分析,它涵盖了迄今为止需要不同直觉、具有不同应用且已在各个社区单独开发的多种分散 SGD 方法。我们的算法框架涵盖了自适应网络拓扑上的本地 SGD 更新和同步和成对八卦更新。我们推导出平滑(凸和非凸)问题的通用收敛率,并且在异构(非相同分布的数据)和 iid 数据设置之间插值,在许多特殊情况下恢复线性收敛率,例如对于过度参数化的模型。我们的证明依赖于弱假设(通常在几个方面比先前的工作有所改进)并恢复(和改进)许多重要场景的最佳已知复杂性结果,例如协作 SGD 和联邦平均(本地 SGD)。
更新日期:2020-11-12
down
wechat
bug