vPipe: A Virtualized Acceleration System for Achieving Efficient and Scalable Pipeline Parallel DNN Training,IEEE Transactions on Parallel and Distributed Systems

当前位置： X-MOL 学术 › IEEE Trans. Parallel Distrib. Syst. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

vPipe: A Virtualized Acceleration System for Achieving Efficient and Scalable Pipeline Parallel DNN Training
IEEE Transactions on Parallel and Distributed Systems ( IF 5.6 ) Pub Date : 2021-07-02 , DOI: 10.1109/tpds.2021.3094364
Shixiong Zhao , Fanxin Li , Xusheng Chen , Xiuxian Guan , Jianyu Jiang , Dong Huang , Yuhao Qing , Sen Wang , Peng Wang , Gong Zhang , Cheng Li , Ping Luo , Heming Cui

The increasing computational complexity of DNNs achieved unprecedented successes in various areas such as machine vision and natural language processing (NLP), e.g., the recent advanced Transformer has billions of parameters. However, as large-scale DNNs significantly exceed GPU’s physical memory limit, they cannot be trained by conventional methods such as data parallelism. Pipeline parallelism that partitions a large DNN into small subnets and trains them on different GPUs is a plausible solution. Unfortunately, the layer partitioning and memory management in existing pipeline parallel systems are fixed during training, making them easily impeded by out-of-memory errors and the GPU under-utilization. These drawbacks amplify when performing neural architecture search (NAS) such as the evolved Transformer, where different network architectures of Transformer needed to be trained repeatedly. vPipe is the first system that transparently provides dynamic layer partitioning and memory management for pipeline parallelism. vPipe has two unique contributions, including (1) an online algorithm for searching a near-optimal layer partitioning and memory management plan, and (2) a live layer migration protocol for re-balancing the layer distribution across a training pipeline. vPipe improved the training throughput of two notable baselines (Pipedream and GPipe) by 61.4-463.4 percent and 24.8-291.3 percent on various large DNNs and training settings.

中文翻译：

vPipe：用于实现高效且可扩展的管道并行 DNN 训练的虚拟化加速系统

DNN 不断增加的计算复杂性在机器视觉和自然语言处理（NLP）等各个领域取得了前所未有的成功，例如，最近先进的 Transformer 拥有数十亿个参数。然而，由于大规模DNN大大超出了GPU的物理内存限制，因此无法通过数据并行等传统方法进行训练。将大型 DNN 划分为小型子网并在不同 GPU 上训练它们的管道并行性是一种可行的解决方案。不幸的是，现有管道并行系统中的层分区和内存管理在训练期间是固定的，这使得它们很容易受到内存不足错误和 GPU 利用率不足的阻碍。当执行神经架构搜索（NAS）（例如进化的 Transformer）时，这些缺点会放大，其中 Transformer 的不同网络架构需要重复训练。 vPipe 是第一个透明地为管道并行性提供动态层分区和内存管理的系统。 vPipe 有两个独特的贡献，包括 (1) 用于搜索近乎最优的层分区和内存管理计划的在线算法，以及 (2) 用于重新平衡训练管道中的层分布的实时层迁移协议。 vPipe 在各种大型 DNN 和训练设置上将两个值得注意的基线（Pipedream 和 GPipe）的训练吞吐量分别提高了 61.4-463.4% 和 24.8-291.3%。

更新日期：2021-07-02

点击分享查看原文

点击收藏

公开下载

阅读更多本刊最新论文本刊介绍/投稿指南11