当前位置: X-MOL 学术IEEE Trans. Netw. Sci. Eng. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Minimizing Training Time of Distributed Machine Learning by Reducing Data Communication
IEEE Transactions on Network Science and Engineering ( IF 6.7 ) Pub Date : 2021-04-16 , DOI: 10.1109/tnse.2021.3073897
Yubin Duan , Ning Wang , Jie Wu

Due to the additive property of most machine learning objective functions, the training can be distributed to multiple machines. Distributed machine learning is an efficient way to deal with the rapid growth of data volume at the cost of extra inter-machine communication. One common implementation is the parameter server system which contains two types of nodes: worker nodes, which are used for calculating updates, and server nodes, which are used for maintaining parameters. We observe that inefficient communication between workers and servers may slow down the system. Therefore, we propose a graph partition problem to partition data among workers and parameters among servers such that the total training time is minimized. Our problem is NP-Complete. We investigate a two-step heuristic approach that first partitions data, and then partitions parameters. We consider the trade-off between partition time and the saving in training time. Besides, we adopt a multilevel graph partition approach to fit the bipartite graph partitioning. We implement both approaches based on an open-source parameter server platform—PS-lite. Experiment results on synthetic and real-world datasets show that both approaches could significantly improve the communication efficiency up to 14 times compared with the random partition.

中文翻译:


通过减少数据通信来最小化分布式机器学习的训练时间



由于大多数机器学习目标函数的附加特性,训练可以分布到多台机器上。分布式机器学习是一种应对数据量快速增长的有效方法,但代价是额外的机器间通信。一种常见的实现是参数服务器系统,它包含两种类型的节点:用于计算更新的工作节点和用于维护参数的服务器节点。我们观察到工作人员和服务器之间的低效通信可能会减慢系统速度。因此,我们提出了一个图分区问题,以在工作人员之间划分数据并在服务器之间划分参数,从而最大限度地减少总训练时间。我们的问题是 NP 完全问题。我们研究了一种两步启发式方法,首先对数据进行分区,然后对参数进行分区。我们考虑分区时间和节省训练时间之间的权衡。此外,我们采用多级图划分方法来适应二分图划分。我们基于开源参数服务器平台 PS-lite 来实现这两种方法。在合成数据集和真实数据集上的实验结果表明,与随机分区相比,两种方法都可以显着提高通信效率高达 14 倍。
更新日期:2021-04-16
down
wechat
bug