当前位置: X-MOL 学术IEEE Micro › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Communication Profiling and Characterization of Deep Learning Workloads on Clusters with High-Performance Interconnects
IEEE Micro ( IF 2.8 ) Pub Date : 2020-01-01 , DOI: 10.1109/mm.2019.2949986
Ammar Ahmad Awan , Arpan Jain , Ching-Hsiang Chu , Hari Subramoni , Dhableswar K. Panda

Heterogeneous high-performance computing systems with GPUs are equipped with high-performance interconnects like InfiniBand, Omni-Path, PCIe, and NVLink. However, little exists in the literature that captures the performance impact of these interconnects on distributed deep learning (DL). In this article, we choose Horovod, a distributed training middleware, to analyze and profile various DNN training workloads using TensorFlow and PyTorch in addition to standard MPI microbenchmarks. We use a wide variety of systems with CPUs like Intel Xeon and IBM POWER9, GPUs like Volta V100, and various interconnects to analyze the following metrics: 1) message-size with Horovod's tensor-fusion; 2) message-size without tensor-fusion; 3) number of MPI/NCCL calls; and 4) time taken by each MPI/NCCL call. We observed extreme performance variations for non-power-of-two message sizes on different platforms. To address this, we design a message-padding scheme for Horovod, illustrate significantly smoother allreduce latency profiles, and report cases where we observed improvement for end-to-end training.

中文翻译:

具有高性能互连的集群上深度学习工作负载的通信分析和表征

具有 GPU 的异构高性能计算系统配备了 InfiniBand、Omni-Path、PCIe 和 NVLink 等高性能互连。然而,很少有文献描述这些互连对分布式深度学习 (DL) 的性能影响。在本文中,我们选择 Horovod,一种分布式训练中间件,除了标准 MPI 微基准测试外,还使用 ​​TensorFlow 和 PyTorch 分析和剖析各种 DNN 训练工作负载。我们使用各种具有 CPU(如 Intel Xeon 和 IBM POWER9)、GPU(如 Volta V100)和各种互连的系统来分析以下指标:1)使用 Horovod 的张量融合的消息大小;2) 没有张量融合的消息大小;3) MPI/NCCL 调用次数;4) 每次 MPI/NCCL 调用所用的时间。我们在不同平台上观察到非 2 次幂消息大小的极端性能变化。为了解决这个问题,我们为 Horovod 设计了一个消息填充方案,展示了更加平滑的 allreduce 延迟配置文件,并报告了我们观察到端到端训练改进的案例。
更新日期:2020-01-01
down
wechat
bug