当前位置: X-MOL 学术IEEE Trans. Parallel Distrib. Syst. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
An Incremental Iterative Acceleration Architecture in Distributed Heterogeneous Environments With GPUs for Deep Learning
IEEE Transactions on Parallel and Distributed Systems ( IF 5.3 ) Pub Date : 2021-05-07 , DOI: 10.1109/tpds.2021.3078254
Xuedong Zhang , Zhuo Tang , Lifan Du , Li Yang

The parallel computing capabilities of GPUs have a significant impact on computationally intensive iterative tasks. Offloading part or all of the deep learning tasks from the CPU to the GPU for execution is mainstream. However, a large number of redundant iterative calculations exist in the iterative process of computing tasks. Therefore, we propose a GPU-based distributed incremental iterative computing architecture that can make full use of distributed parallel computing and GPU memory structure. The architecture supports deep learning and other computationally intensive iterative applications by optimizing data placement and reducing redundant iterative calculations. To support block-based data partitioning and coalesced memory access on GPUs, we propose GDataSet, an abstract data set. The GPU incremental iteration manager called GTracker is designed to be responsible for GDataSet cache management on the GPU. In order to solve the limitation of on-chip memory size, we propose a variable sliding window mechanism. It improves the hit rate of cache access and the speed of data access by realizing the best block arrangement between on-chip memory and off-chip memory. Besides, a communication channel based on an incremental iterative model is designed to support data transmission and task communication in cluster computing. Finally, we implement the proposed architecture based on Spark 2.4.1 and CUDA 10.0. Comparative experiments with widely used computationally intensive iterative applications (K-means, LSTM, etc.) show that the incremental iterative acceleration architecture can significantly improve the efficiency of iterative computing.

中文翻译:

分布式异构环境中的增量迭代加速架构,具有用于深度学习的 GPU

GPU 的并行计算能力对计算密集型迭代任务具有显着影响。将部分或全部深度学习任务从 CPU 卸载到 GPU 执行是主流。然而,计算任务的迭代过程中存在大量冗余的迭代计算。因此,我们提出了一种基于GPU的分布式增量迭代计算架构,可以充分利用分布式并行计算和GPU内存结构。该架构通过优化数据放置和减少冗余迭代计算来支持深度学习和其他计算密集型迭代应用程序。为了在 GPU 上支持基于块的数据分区和合并内存访问,我们提出了 GDataSet,一个抽象数据集。称为 GTracker 的 GPU 增量迭代管理器旨在负责 GPU 上的 GDataSet 缓存管理。为了解决片上内存大小的限制,我们提出了一种可变滑动窗口机制。通过实现片上存储器和片外存储器之间的最佳块排列,它提高了缓存访问的命中率和数据访问的速度。此外,设计了基于增量迭代模型的通信通道,以支持集群计算中的数据传输和任务通信。最后,我们实现了基于 Spark 2.4.1 和 CUDA 10.0 的建议架构。与广泛使用的计算密集型迭代应用程序(K-means、LSTM 等)的比较实验
更新日期:2021-05-28
down
wechat
bug