当前位置:
X-MOL 学术
›
arXiv.cs.DC
›
论文详情
Our official English website, www.x-mol.net, welcomes your
feedback! (Note: you will need to create a separate account there.)
A Novel Memory-Efficient Deep Learning Training Framework via Error-Bounded Lossy Compression
arXiv - CS - Distributed, Parallel, and Cluster Computing Pub Date : 2020-11-18 , DOI: arxiv-2011.09017 Sian Jin, Guanpeng Li, Shuaiwen Leon Song, Dingwen Tao
arXiv - CS - Distributed, Parallel, and Cluster Computing Pub Date : 2020-11-18 , DOI: arxiv-2011.09017 Sian Jin, Guanpeng Li, Shuaiwen Leon Song, Dingwen Tao
Deep neural networks (DNNs) are becoming increasingly deeper, wider, and
non-linear due to the growing demands on prediction accuracy and analysis
quality. When training a DNN model, the intermediate activation data must be
saved in the memory during forward propagation and then restored for backward
propagation. However, state-of-the-art accelerators such as GPUs are only
equipped with very limited memory capacities due to hardware design
constraints, which significantly limits the maximum batch size and hence
performance speedup when training large-scale DNNs. In this paper, we propose a novel memory-driven high performance DNN training
framework that leverages error-bounded lossy compression to significantly
reduce the memory requirement for training in order to allow training larger
networks. Different from the state-of-the-art solutions that adopt image-based
lossy compressors such as JPEG to compress the activation data, our framework
purposely designs error-bounded lossy compression with a strict
error-controlling mechanism. Specifically, we provide theoretical analysis on
the compression error propagation from the altered activation data to the
gradients, and then empirically investigate the impact of altered gradients
over the entire training process. Based on these analyses, we then propose an
improved lossy compressor and an adaptive scheme to dynamically configure the
lossy compression error-bound and adjust the training batch size to further
utilize the saved memory space for additional speedup. We evaluate our design
against state-of-the-art solutions with four popular DNNs and the ImageNet
dataset. Results demonstrate that our proposed framework can significantly
reduce the training memory consumption by up to 13.5x and 1.8x over the
baseline training and state-of-the-art framework with compression,
respectively, with little or no accuracy loss.
中文翻译:
通过限错有损压缩的新型内存高效深度学习训练框架
由于对预测准确性和分析质量的需求不断增长,深度神经网络 (DNN) 变得越来越深入、广泛和非线性。在训练 DNN 模型时,中间激活数据在前向传播时必须保存在内存中,然后在反向传播时恢复。然而,由于硬件设计的限制,最先进的加速器(例如 GPU)仅配备非常有限的内存容量,这极大地限制了最大批量大小,从而在训练大规模 DNN 时限制了性能加速。在本文中,我们提出了一种新颖的内存驱动的高性能 DNN 训练框架,该框架利用错误有界有损压缩来显着降低训练的内存需求,以允许训练更大的网络。与采用基于图像的有损压缩器(如 JPEG)来压缩激活数据的最先进解决方案不同,我们的框架特意设计了具有严格错误控制机制的错误有界有损压缩。具体来说,我们提供了关于从改变的激活数据到梯度的压缩误差传播的理论分析,然后凭经验研究改变的梯度对整个训练过程的影响。基于这些分析,我们然后提出了一种改进的有损压缩器和一种自适应方案,以动态配置有损压缩误差界限并调整训练批次大小,以进一步利用节省的内存空间来进一步提高速度。我们使用四个流行的 DNN 和 ImageNet 数据集根据最先进的解决方案评估我们的设计。
更新日期:2020-11-19
中文翻译:
通过限错有损压缩的新型内存高效深度学习训练框架
由于对预测准确性和分析质量的需求不断增长,深度神经网络 (DNN) 变得越来越深入、广泛和非线性。在训练 DNN 模型时,中间激活数据在前向传播时必须保存在内存中,然后在反向传播时恢复。然而,由于硬件设计的限制,最先进的加速器(例如 GPU)仅配备非常有限的内存容量,这极大地限制了最大批量大小,从而在训练大规模 DNN 时限制了性能加速。在本文中,我们提出了一种新颖的内存驱动的高性能 DNN 训练框架,该框架利用错误有界有损压缩来显着降低训练的内存需求,以允许训练更大的网络。与采用基于图像的有损压缩器(如 JPEG)来压缩激活数据的最先进解决方案不同,我们的框架特意设计了具有严格错误控制机制的错误有界有损压缩。具体来说,我们提供了关于从改变的激活数据到梯度的压缩误差传播的理论分析,然后凭经验研究改变的梯度对整个训练过程的影响。基于这些分析,我们然后提出了一种改进的有损压缩器和一种自适应方案,以动态配置有损压缩误差界限并调整训练批次大小,以进一步利用节省的内存空间来进一步提高速度。我们使用四个流行的 DNN 和 ImageNet 数据集根据最先进的解决方案评估我们的设计。