当前位置: X-MOL 学术IEEE J. Sel. Area. Comm. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
AC-SGD: Adaptively Compressed SGD for Communication-Efficient Distributed Learning
IEEE Journal on Selected Areas in Communications ( IF 16.4 ) Pub Date : 2022-07-20 , DOI: 10.1109/jsac.2022.3192050
Guangfeng Yan 1 , Tan Li 1 , Shao-Lun Huang 2 , Tian Lan 3 , Linqi Song 1
Affiliation  

Gradient compression (e.g., gradient quantization and gradient sparsification) is a core technique in reducing communication costs in distributed learning systems. The recent trend of gradient compression is to use a varying number of bits across iterations, however, relying on empirical observations or engineering heuristics without a systematic treatment and analysis. To the best of our knowledge, a general dynamic gradient compression that leverages both quantization and sparsification techniques is still far from understanding. This paper proposes a novel Adaptively-Compressed Stochastic Gradient Descent (AC-SGD) strategy to adjust the number of quantization bits and the sparsification size with respect to the norm of gradients, the communication budget, and the remaining number of iterations. In particular, we derive an upper bound, tight in some cases, of the convergence error for arbitrary dynamic compression strategy. Then we consider communication budget constraints and propose an optimization formulation - denoted as the Adaptive Compression Problem (ACP) - for minimizing the deep model’s convergence error under such constraints. By solving the ACP, we obtain an enhanced compression algorithm that significantly improves model accuracy under given communication budget constraints. Finally, through extensive experiments on computer vision and natural language processing tasks on MNIST, CIFAR-10, CIFAR-100 and AG-News datasets, respectively, we demonstrate that our compression scheme significantly outperforms the state-of-the-art gradient compression methods in terms of mitigating communication costs.

中文翻译:

AC-SGD:用于通信高效分布式学习的自适应压缩 SGD

梯度压缩(例如梯度量化和梯度稀疏化)是降低分布式学习系统中通信成本的核心技术。梯度压缩的最新趋势是在迭代中使用不同数量的比特,然而,依赖于经验观察或工程启发式,没有系统的处理和分析。据我们所知,利用量化和稀疏化技术的通用动态梯度压缩仍然远未理解。本文提出了一种新颖的自适应压缩随机梯度下降 (AC-SGD) 策略,以根据梯度范数、通信预算和剩余迭代次数来调整量化位数和稀疏化大小。特别是,我们得出一个上限,在某些情况下,任意动态压缩策略的收敛误差。然后我们考虑通信预算约束并提出优化公式 - 表示为自适应压缩问题 (ACP) - 用于在此类约束下最小化深度模型的收敛误差。通过求解 ACP,我们获得了一种增强的压缩算法,该算法在给定的通信预算约束下显着提高了模型的准确性。最后,通过对 MNIST、CIFAR-10、CIFAR-100 和 AG-News 数据集的计算机视觉和自然语言处理任务的广泛实验,我们证明了我们的压缩方案显着优于最先进的梯度压缩方法在降低通信成本方面。
更新日期:2022-07-20
down
wechat
bug