当前位置: X-MOL 学术Parallel Comput. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
A computational-graph partitioning method for training memory-constrained DNNs
Parallel Computing ( IF 1.4 ) Pub Date : 2021-04-29 , DOI: 10.1016/j.parco.2021.102792
Fareed Qararyah , Mohamed Wahib , Doğa Dikbayır , Mehmet Esat Belviranli , Didem Unat

Many state-of-the-art Deep Neural Networks (DNNs) have substantial memory requirements. Limited device memory becomes a bottleneck when training those models. We propose ParDNN, an automatic, generic, and non-intrusive partitioning strategy for DNNs that are represented as computational graphs. ParDNN decides a placement of DNN’s underlying computational graph operations across multiple devices so that the devices’ memory constraints are met and the training time is minimized. ParDNN is completely independent of the deep learning aspects of a DNN. It requires no modification neither at the model nor at the systems level implementation of its operation kernels. ParDNN partitions DNNs having billions of parameters and hundreds of thousands of operations in seconds to few minutes. Our experiments with TensorFlow on 16 GPUs demonstrate efficient training of 5 very large models while achieving superlinear scaling for both the batch size and training throughput. ParDNN either outperforms or qualitatively improves upon the related work.



中文翻译:

训练内存受限DNN的计算图划分方法

许多最先进的深度神经网络(DNN)都有大量的内存需求。训练这些模型时,有限的设备内存成为瓶颈。我们提出了ParDNN,这是一种用于DNN的自动,通用且非侵入式分区策略,以计算图的形式表示。ParDNN决定DNN的基础计算图操作在多个设备上的放置,以便满足设备的内存限制并最大程度地减少训练时间。ParDNN完全独立于DNN的深度学习方面。它既不需要在模型上也不需要在其操作内核的系统级实现上进行任何修改。神经网络在数秒至数分钟内将具有数十亿个参数和数十万次操作的DNN分区。我们在16个GPU上使用TensorFlow进行的实验表明,有效训练了5个非常大的模型,同时实现了批量大小和训练吞吐量的超线性缩放。在相关工作上,ParDNN的表现优于或质上得到了改善。

更新日期:2021-05-25
down
wechat
bug