Decentralized Deep Learning with Arbitrary Communication Compression,arXiv - CS - Data Structures and Algorithms

当前位置： X-MOL 学术 › arXiv.cs.DS › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Decentralized Deep Learning with Arbitrary Communication Compression
arXiv - CS - Data Structures and Algorithms Pub Date : 2019-07-22 , DOI: arxiv-1907.09356
Anastasia Koloskova, Tao Lin, Sebastian U. Stich, Martin Jaggi

Decentralized training of deep learning models is a key element for enabling data privacy and on-device learning over networks, as well as for efficient scaling to large compute clusters. As current approaches suffer from limited bandwidth of the network, we propose the use of communication compression in the decentralized training context. We show that Choco-SGD $-$ recently introduced and analyzed for strongly-convex objectives only $-$ converges under arbitrary high compression ratio on general non-convex functions at the rate $O\bigl(1/\sqrt{nT}\bigr)$ where $T$ denotes the number of iterations and $n$ the number of workers. The algorithm achieves linear speedup in the number of workers and supports higher compression than previous state-of-the art methods. We demonstrate the practical performance of the algorithm in two key scenarios: the training of deep learning models (i) over distributed user devices, connected by a social network and (ii) in a datacenter (outperforming all-reduce time-wise).

中文翻译：

具有任意通信压缩的去中心化深度学习

深度学习模型的分散训练是通过网络实现数据隐私和设备上学习以及有效扩展到大型计算集群的关键要素。由于当前的方法受到网络带宽有限的影响，我们建议在分散的训练环境中使用通信压缩。我们表明，Choco-SGD $-$ 最近引入并针对强凸目标进行了分析，只有 $-$ 在一般非凸函数的任意高压缩比下以 $O\bigl(1/\sqrt{nT}\ bigr)$ 其中，$T$ 表示迭代次数，$n$ 表示worker 的数量。该算法实现了工人数量的线性加速，并支持比以前最先进的方法更高的压缩率。我们在两个关键场景中展示了该算法的实际性能：

更新日期：2020-11-12

点击分享查看原文

点击收藏

阅读更多本刊最新论文