SKCompress: compressing sparse and nonuniform gradient in distributed machine learning,The VLDB Journal

当前位置： X-MOL 学术 › VLDB J. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

SKCompress: compressing sparse and nonuniform gradient in distributed machine learning
The VLDB Journal ( IF 2.8 ) Pub Date : 2020-01-01 , DOI: 10.1007/s00778-019-00596-3
Jiawei Jiang , Fangcheng Fu , Tong Yang , Yingxia Shao , Bin Cui

Distributed machine learning (ML) has been extensively studied to meet the explosive growth of training data. A wide range of machine learning models are trained by a family of first-order optimization algorithms, i.e., stochastic gradient descent (SGD). The core operation of SGD is the calculation of gradients. When executing SGD in a distributed environment, the workers need to exchange local gradients through the network. In order to reduce the communication cost, a category of quantification-based compression algorithms are used to transform the gradients to binary format, at the expense of a low precision loss. Although the existing approaches work fine for dense gradients, we find that these methods are ill-suited for many cases where the gradients are sparse and nonuniformly distributed. In this paper, we study is there a compression framework that can efficiently handle sparse and nonuniform gradients? We propose a general compression framework, called SKCompress, to compress both gradient values and gradient keys in sparse gradients. Our first contribution is a sketch-based method that compresses the gradient values. Sketch is a class of algorithm that approximates the distribution of a data stream with a probabilistic data structure. We first use a quantile sketch to generate splits, sort gradient values into buckets, and encode them with the bucket indexes. Our second contribution is a new sketch algorithm, namely MinMaxSketch, which compresses the bucket indexes. MinMaxSketch builds a set of hash tables and solves hash collisions with a MinMax strategy. Since the bucket indexes are nonuniform, we further adopt Huffman coding to compress MinMaxSketch. To compress the keys of sparse gradients, the third contribution of this paper is a delta-binary encoding method that calculates the increment of the gradient keys and encode them with binary format. An adaptive prefix is proposed to assign different sizes to different gradient keys, so that we can save more space. We also theoretically discuss the correctness and the error bound of our proposed methods. To the best of our knowledge, this is the first effort utilizing data sketch to compress gradients in ML. We implement a prototype system in a real cluster of our industrial partner Tencent Inc. and show that our method is up to \(12\times \) faster than the existing methods.

中文翻译：

SKCompress：在分布式机器学习中压缩稀疏和非均匀梯度

分布式机器学习（ML）已得到广泛研究，以适应训练数据的爆炸式增长。一系列的机器学习模型通过一阶优化算法（即随机梯度下降（SGD））进行训练。SGD的核心操作是梯度的计算。在分布式环境中执行SGD时，工作人员需要通过网络交换本地梯度。为了降低通信成本，使用一种基于量化的压缩算法将梯度转换为二进制格式，但代价是精度损失较低。尽管现有方法对密集的渐变效果很好，但我们发现这些方法不适用于渐变稀疏且分布不均匀的许多情况。在本文中，我们研究是否有一个压缩框架可以有效处理稀疏和不均匀的渐变？我们提出了一个通用压缩框架SKCompress，用于压缩稀疏梯度中的梯度值和梯度键。我们的第一个贡献是压缩梯度值的基于草图的方法。Sketch是一类算法，它用概率数据结构来近似数据流的分布。我们首先使用分位数草图来生成拆分，将梯度值分类到存储桶中，并使用存储桶索引对其进行编码。我们的第二个贡献是一个新的草图算法，即MinMaxSketch，它可以压缩存储区索引。MinMaxSketch构建一组哈希表，并使用MinMax策略解决哈希冲突。由于存储桶索引不一致，因此我们进一步采用霍夫曼编码来压缩MinMaxSketch。要压缩稀疏渐变的键，本文的第三点贡献是一种增量二进制编码方法，该方法可计算梯度键的增量并以二进制格式对其进行编码。提出了一个自适应前缀来为不同的梯度键分配不同的大小，以便节省更多空间。我们还从理论上讨论了我们提出的方法的正确性和错误范围。据我们所知，这是第一次利用数据草图压缩ML中的梯度。我们在我们的工业合作伙伴腾讯公司的真实集群中实现了原型系统，并表明我们的方法可以我们还从理论上讨论了我们提出的方法的正确性和错误范围。据我们所知，这是第一次利用数据草图压缩ML中的梯度。我们在我们的工业合作伙伴腾讯公司的真实集群中实现了原型系统，并表明我们的方法可以我们还从理论上讨论了我们提出的方法的正确性和错误范围。据我们所知，这是第一次利用数据草图压缩ML中的梯度。我们在我们的工业合作伙伴腾讯公司的真实集群中实现了原型系统，并表明我们的方法可以比现有方法快（12倍）。

更新日期：2020-01-01

点击分享查看原文

点击收藏

阅读更多本刊最新论文