WP-SGD: Weighted parallel SGD for distributed unbalanced-workload training system,Journal of Parallel and Distributed Computing

当前位置： X-MOL 学术 › J. Parallel Distrib. Comput. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

WP-SGD: Weighted parallel SGD for distributed unbalanced-workload training system
Journal of Parallel and Distributed Computing ( IF 3.4 ) Pub Date : 2020-07-01 , DOI: 10.1016/j.jpdc.2020.06.011
Daning Cheng , Shigang Li , Yunquan Zhang

Stochastic gradient descent (SGD) is a popular stochastic optimization method in machine learning. Traditional parallel SGD algorithms, e.g., SimuParallel SGD (Zinkevich, 2010), often require all nodes to have the same performance or to consume equal quantities of data. However, these requirements are difficult to satisfy when the parallel SGD algorithms run in a heterogeneous computing environment; low-performance nodes will exert a negative influence on the final result. In this paper, we propose an algorithm called weighted parallel SGD (WP-SGD). WP-SGD combines weighted model parameters from different nodes in the system to produce the final output. WP-SGD makes use of the reduction in standard deviation to compensate for the loss from the inconsistency in performance of nodes in the cluster, which means that WP-SGD does not require that all nodes consume equal quantities of data. We also propose the methods of running two other parallel SGD algorithms combined with WP-SGD in a heterogeneous environment. The experimental results show that WP-SGD significantly outperforms the traditional parallel SGD algorithms on distributed training systems with an unbalanced workload.

中文翻译：

WP-SGD：用于分布式不平衡工作量训练系统的加权并行SGD

随机梯度下降（SGD）是机器学习中一种流行的随机优化方法。传统的并行SGD算法，例如SimuParallel SGD（Zinkevich，2010年），通常要求所有节点具有相同的性能或消耗相等数量的数据。但是，当并行SGD算法在异构计算环境中运行时，很难满足这些要求。低性能节点将对最终结果产生负面影响。在本文中，我们提出了一种称为加权并行SGD（WP-SGD）的算法。WP-SGD结合了系统中不同节点的加权模型参数以产生最终输出。WP-SGD利用标准偏差的减少来补偿由于群集中节点性能不一致而造成的损失，这意味着WP-SGD不需要所有节点消耗相同数量的数据。我们还提出了在异构环境中结合WP-SGD运行两种其他并行SGD算法的方法。实验结果表明，在工作量不均衡的分布式训练系统上，WP-SGD明显优于传统的并行SGD算法。

更新日期：2020-07-21

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南11