当前位置: X-MOL 学术IEEE Trans. Parallel Distrib. Syst. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Distributed Training of Deep Learning Models: A Taxonomic Perspective
IEEE Transactions on Parallel and Distributed Systems ( IF 5.3 ) Pub Date : 2020-12-01 , DOI: 10.1109/tpds.2020.3003307
Matthias Langer , Zhen He , Wenny Rahayu , Yanbo Xue

Distributed deep learning systems (DDLS) train deep neural network models by utilizing the distributed resources of a cluster. Developers of DDLS are required to make many decisions to process their particular workloads in their chosen environment efficiently. The advent of GPU-based deep learning, the ever-increasing size of datasets, and deep neural network models, in combination with the bandwidth constraints that exist in cluster environments require developers of DDLS to be innovative in order to train high-quality models quickly. Comparing DDLS side-by-side is difficult due to their extensive feature lists and architectural deviations. We aim to shine some light on the fundamental principles that are at work when training deep neural networks in a cluster of independent machines by analyzing the general properties associated with training deep learning models and how such workloads can be distributed in a cluster to achieve collaborative model training. Thereby we provide an overview of the different techniques that are used by contemporary DDLS and discuss their influence and implications on the training process. To conceptualize and compare DDLS, we group different techniques into categories, thus establishing a taxonomy of distributed deep learning systems.

中文翻译:

深度学习模型的分布式训练:分类学视角

分布式深度学习系统 (DDLS) 通过利用集群的分布式资源来训练深度神经网络模型。DDLS 的开发人员需要做出许多决定,以在他们选择的环境中有效地处理他们的特定工作负载。基于 GPU 的深度学习的出现、不断增加的数据集和深度神经网络模型,再加上集群环境中存在的带宽限制,要求 DDLS 的开发人员进行创新,以便快速训练高质量模型. 由于其广泛的功能列表和架构偏差,很难并排比较 DDLS。我们的目标是通过分析与训练深度学习模型相关的一般属性以及如何在集群中分配此类工作负载以实现协作模型,从而阐明在独立机器集群中训练深度神经网络时起作用的基本原理训练。因此,我们概述了当代 DDLS 使用的不同技术,并讨论了它们对培训过程的影响和影响。为了概念化和比较 DDLS,我们将不同的技术分组,从而建立分布式深度学习系统的分类法。因此,我们概述了当代 DDLS 使用的不同技术,并讨论了它们对培训过程的影响和影响。为了概念化和比较 DDLS,我们将不同的技术分组,从而建立分布式深度学习系统的分类法。因此,我们概述了当代 DDLS 使用的不同技术,并讨论了它们对培训过程的影响和影响。为了概念化和比较 DDLS,我们将不同的技术分组,从而建立分布式深度学习系统的分类法。
更新日期:2020-12-01
down
wechat
bug