LOSP: Overlap Synchronization Parallel With Local Compensation for Fast Distributed Training,IEEE Journal on Selected Areas in Communications

当前位置： X-MOL 学术 › IEEE J. Sel. Area. Comm. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

LOSP: Overlap Synchronization Parallel With Local Compensation for Fast Distributed Training
IEEE Journal on Selected Areas in Communications ( IF 16.4 ) Pub Date : 2021-06-07 , DOI: 10.1109/jsac.2021.3087272
Haozhao Wang , Zhihao Qu , Song Guo , Ningqi Wang , Ruixuan Li , Weihua Zhuang

When running in Parameter Server (PS), the Distributed Stochastic Gradient Descent (D-SGD) incurs significant communication delays and huge communication overhead due to the model synchronization. Moreover, considering the heterogeneity of computational capability among workers, traditional synchronization modes incur under-utilization of computational resources because fast workers have to wait for slow ones finishing the computation. Although our previous work OSP can effectively solve these problems by overlapping the computation and communication procedures and allowing adaptive multiple local updates in distributed training, it causes the staleness problem brought by the overlap, yielding a performance degradation. In this paper, we propose a new method named LOSP by introducing local compensation to our previous synchronization mechanism, which mitigates adverse effects caused by the overlapping synchronization. We theoretically prove that LOSP (1) preserves the same convergence rate as the sequential SGD for non-convex problems, and (2) exhibits good scalability due to the linear speedup property with respect to both the number of workers and the average number of local updates. Evaluations show that LOSP significantly improves performance over the state-of-the-art ones in terms of both convergence accuracy and communication cost.

中文翻译：

LOSP：用于快速分布式训练的与本地补偿并行的重叠同步

在参数服务器 (PS) 中运行时，分布式随机梯度下降 (D-SGD) 由于模型同步而导致显着的通信延迟和巨大的通信开销。此外，考虑到工作人员之间计算能力的异构性，传统的同步模式会导致计算资源利用不足，因为快速工作人员必须等待慢速工作人员完成计算。虽然我们之前的工作 OSP 可以通过重叠计算和通信过程以及在分布式训练中允许自适应多次局部更新来有效解决这些问题，但它会导致重叠带来的陈旧问题，从而导致性能下降。在本文中，我们通过在我们之前的同步机制中引入局部补偿，提出了一种名为 LOSP 的新方法，这减轻了重叠同步造成的不利影响。我们从理论上证明 LOSP (1) 保持与非凸问题的顺序 SGD 相同的收敛速度，并且 (2) 由于相对于工人数量和本地平均数量的线性加速特性而表现出良好的可扩展性。更新。评估表明，在收敛精度和通信成本方面，LOSP 显着提高了最先进的性能。

更新日期：2021-07-16

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南

全部期刊列表>>