当前位置: X-MOL 学术IEEE Trans. Parallel Distrib. Syst. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Error-Compensated Sparsification for Communication-Efficient Decentralized Training in Edge Environment
IEEE Transactions on Parallel and Distributed Systems ( IF 5.6 ) Pub Date : 2021-05-26 , DOI: 10.1109/tpds.2021.3084104
Haozhao Wang , Song Guo , Zhihao Qu , Ruixuan Li , Ziming Liu

Communication has been considered as a major bottleneck in large-scale decentralized training systems since participating nodes iteratively exchange large amounts of intermediate data with their neighbors. Although compression techniques like sparsification can significantly reduce the communication overhead in each iteration, errors caused by compression will be accumulated, resulting in a severely degraded convergence rate. Recently, the error compensation method for sparsification has been proposed in centralized training to tolerate the accumulated compression errors. However, the analog technique and the corresponding theory about its convergence in decentralized training are still unknown. To fill in the gap, we design a method named ECSD-SGD that significantly accelerates decentralized training via error-compensated sparsification. The novelty lies in that we identify the component of the exchanging information in each iteration (i.e., the sparsified model update) and make targeted error compensation over the component. Our thorough theoretical analysis shows that ECSD-SGD supports arbitrary sparsification ratio and achieves the same convergence rate as the non-sparsified decentralized training methods. We also conduct extensive experiments on multiple deep learning models to validate our theoretical findings. Results show that ECSD-SGD outperforms all the start-of-the-art sparsified methods in terms of both the convergence speed and the final generalization accuracy.

中文翻译:


边缘环境中通信高效的分散式训练的误差补偿稀疏化



由于参与节点与邻居迭代地交换大量中间数据,通信被认为是大规模去中心化训练系统的主要瓶颈。尽管稀疏化等压缩技术可以显着减少每次迭代中的通信开销,但压缩引起的错误将会累积,导致收敛速度严重下降。最近,在集中训练中提出了稀疏化的误差补偿方法,以容忍累积的压缩误差。然而,模拟技术及其在分散训练中的收敛性的相应理论仍然未知。为了填补这一空白,我们设计了一种名为 ECSD-SGD 的方法,该方法通过误差补偿稀疏化显着加速去中心化训练。新颖之处在于,我们识别了每次迭代中交换信息的分量(即稀疏模型更新),并对该分量进行有针对性的误差补偿。我们深入的理论分析表明,ECSD-SGD 支持任意稀疏率,并实现与非稀疏分散训练方法相同的收敛速度。我们还对多种深度学习模型进行了广泛的实验,以验证我们的理论发现。结果表明,ECSD-SGD 在收敛速度和最终泛化精度方面均优于所有最先进的稀疏方法。
更新日期:2021-05-26
down
wechat
bug