当前位置: X-MOL 学术J. Supercomput. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Training deep neural networks: a static load balancing approach
The Journal of Supercomputing ( IF 2.5 ) Pub Date : 2020-03-02 , DOI: 10.1007/s11227-020-03200-6
Sergio Moreno-Álvarez , Juan M. Haut , Mercedes E. Paoletti , Juan A. Rico-Gallego , Juan C. Díaz-Martín , Javier Plaza

Deep neural networks are currently trained under data-parallel setups on high-performance computing (HPC) platforms, so that a replica of the full model is charged to each computational resource using non-overlapped subsets known as batches. Replicas combine the computed gradients to update their local copies at the end of each batch. However, differences in performance of resources assigned to replicas in current heterogeneous platforms induce waiting times when synchronously combining gradients, leading to an overall performance degradation. Albeit asynchronous communication of gradients has been proposed as an alternative, it suffers from the so-called staleness problem. This is due to the fact that the training in each replica is computed using a stale version of the parameters, which negatively impacts the accuracy of the resulting model. In this work, we study the application of well-known HPC static load balancing techniques to the distributed training of deep models. Our approach is assigning a different batch size to each replica, proportional to its relative computing capacity, hence minimizing the staleness problem. Our experimental results (obtained in the context of a remotely sensed hyperspectral image processing application) show that, while the classification accuracy is kept constant, the training time substantially decreases with respect to unbalanced training. This is illustrated using heterogeneous computing platforms, made up of CPUs and GPUs with different performance.

中文翻译:

训练深度神经网络:一种静态负载平衡方法

深度神经网络目前在高性能计算 (HPC) 平台上的数据并行设置下进行训练,因此使用称为批次的非重叠子集向每个计算资源收取完整模型的副本。副本结合计算的梯度以在每批结束时更新其本地副本。然而,当前异构平台中分配给副本的资源性能差异导致同步组合梯度时的等待时间,导致整体性能下降。尽管梯度的异步通信已被提议作为替代方案,但它存在所谓的陈旧问题。这是因为每个副本中的训练是使用旧版本的参数计算的,这会对结果模型的准确性产生负面影响。在这项工作中,我们研究了著名的 HPC 静态负载平衡技术在深度模型的分布式训练中的应用。我们的方法是为每个副本分配不同的批量大小,与其相对计算能力成正比,从而最大限度地减少陈旧问题。我们的实验结果(在遥感高光谱图像处理应用程序的上下文中获得)表明,虽然分类精度保持不变,但相对于不平衡训练,训练时间大大减少。使用由具有不同性能的 CPU 和 GPU 组成的异构计算平台来说明这一点。与其相对计算能力成正比,从而最小化陈旧问题。我们的实验结果(在遥感高光谱图像处理应用程序的上下文中获得)表明,虽然分类精度保持不变,但相对于不平衡训练,训练时间大大减少。使用由具有不同性能的 CPU 和 GPU 组成的异构计算平台来说明这一点。与其相对计算能力成正比,从而最小化陈旧问题。我们的实验结果(在遥感高光谱图像处理应用程序的上下文中获得)表明,虽然分类精度保持不变,但相对于不平衡训练,训练时间大大减少。使用由具有不同性能的 CPU 和 GPU 组成的异构计算平台来说明这一点。
更新日期:2020-03-02
down
wechat
bug