A Novel Memory-Efficient Deep Learning Training Framework via Error-Bounded Lossy Compression,arXiv - CS - Distributed, Parallel, and Cluster Computing

当前位置： X-MOL 学术 › arXiv.cs.DC › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

A Novel Memory-Efficient Deep Learning Training Framework via Error-Bounded Lossy Compression
arXiv - CS - Distributed, Parallel, and Cluster Computing Pub Date : 2020-11-18 , DOI: arxiv-2011.09017
Sian Jin, Guanpeng Li, Shuaiwen Leon Song, Dingwen Tao

Deep neural networks (DNNs) are becoming increasingly deeper, wider, and non-linear due to the growing demands on prediction accuracy and analysis quality. When training a DNN model, the intermediate activation data must be saved in the memory during forward propagation and then restored for backward propagation. However, state-of-the-art accelerators such as GPUs are only equipped with very limited memory capacities due to hardware design constraints, which significantly limits the maximum batch size and hence performance speedup when training large-scale DNNs. In this paper, we propose a novel memory-driven high performance DNN training framework that leverages error-bounded lossy compression to significantly reduce the memory requirement for training in order to allow training larger networks. Different from the state-of-the-art solutions that adopt image-based lossy compressors such as JPEG to compress the activation data, our framework purposely designs error-bounded lossy compression with a strict error-controlling mechanism. Specifically, we provide theoretical analysis on the compression error propagation from the altered activation data to the gradients, and then empirically investigate the impact of altered gradients over the entire training process. Based on these analyses, we then propose an improved lossy compressor and an adaptive scheme to dynamically configure the lossy compression error-bound and adjust the training batch size to further utilize the saved memory space for additional speedup. We evaluate our design against state-of-the-art solutions with four popular DNNs and the ImageNet dataset. Results demonstrate that our proposed framework can significantly reduce the training memory consumption by up to 13.5x and 1.8x over the baseline training and state-of-the-art framework with compression, respectively, with little or no accuracy loss.

中文翻译：

通过限错有损压缩的新型内存高效深度学习训练框架

由于对预测准确性和分析质量的需求不断增长，深度神经网络 (DNN) 变得越来越深入、广泛和非线性。在训练 DNN 模型时，中间激活数据在前向传播时必须保存在内存中，然后在反向传播时恢复。然而，由于硬件设计的限制，最先进的加速器（例如 GPU）仅配备非常有限的内存容量，这极大地限制了最大批量大小，从而在训练大规模 DNN 时限制了性能加速。在本文中，我们提出了一种新颖的内存驱动的高性能 DNN 训练框架，该框架利用错误有界有损压缩来显着降低训练的内存需求，以允许训练更大的网络。与采用基于图像的有损压缩器（如 JPEG）来压缩激活数据的最先进解决方案不同，我们的框架特意设计了具有严格错误控制机制的错误有界有损压缩。具体来说，我们提供了关于从改变的激活数据到梯度的压缩误差传播的理论分析，然后凭经验研究改变的梯度对整个训练过程的影响。基于这些分析，我们然后提出了一种改进的有损压缩器和一种自适应方案，以动态配置有损压缩误差界限并调整训练批次大小，以进一步利用节省的内存空间来进一步提高速度。我们使用四个流行的 DNN 和 ImageNet 数据集根据最先进的解决方案评估我们的设计。

更新日期：2020-11-19

点击分享查看原文

点击收藏

阅读更多本刊最新论文