DNN Training Acceleration via Exploring GPGPU Friendly Sparsity,arXiv - CS - Hardware Architecture

当前位置： X-MOL 学术 › arXiv.cs.AR › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

DNN Training Acceleration via Exploring GPGPU Friendly Sparsity
arXiv - CS - Hardware Architecture Pub Date : 2022-03-11 , DOI: arxiv-2203.05705
Zhuoran Song, Yihong Xu, Han Li, Naifeng Jing, Xiaoyao Liang, Li Jiang

The training phases of Deep neural network~(DNN) consumes enormous processing time and energy. Compression techniques utilizing the sparsity of DNNs can effectively accelerate the inference phase of DNNs. However, it is hardly used in the training phase because the training phase involves dense matrix-multiplication using General-Purpose Computation on Graphics Processors (GPGPU), which endorse the regular and structural data layout. In this paper, we first propose the Approximate Random Dropout that replaces the conventional random dropout of neurons and synapses with a regular and online generated row-based or tile-based dropout patterns to eliminate the unnecessary computation and data access for the multilayer perceptron~(MLP) and long short-term memory~(LSTM). We then develop a SGD-based Search Algorithm that produces the distribution of row-based or tile-based dropout patterns to compensate for the potential accuracy loss. Moreover, aiming at the convolution neural network~(CNN) training acceleration, we first explore the importance and sensitivity of input feature maps; and then propose the sensitivity-aware dropout method to dynamically drop the input feature maps based on their sensitivity so as to achieve greater forward and backward training acceleration while reserving better NN accuracy. To facilitate DNN programming, we build a DNN training computation framework that unifies the proposed techniques in the software stack. As a result, the GPGPU only needs to support the basic operator -- matrix multiplication and can achieve significant performance improvement regardless of DNN model.

中文翻译：

通过探索 GPGPU 友好的稀疏性来加速 DNN 训练

深度神经网络~（DNN）的训练阶段消耗了大量的处理时间和能量。利用 DNN 的稀疏性的压缩技术可以有效地加速 DNN 的推理阶段。然而，它在训练阶段几乎没有使用，因为训练阶段涉及使用图形处理器上的通用计算 (GPGPU) 进行密集矩阵乘法，这支持常规和结构化数据布局。在本文中，我们首先提出了 Approximate Random Dropout，它用常规和在线生成的基于行或基于 tile 的 dropout 模式代替了传统的神经元和突触随机 dropout，以消除多层感知器不必要的计算和数据访问~( MLP）和长短期记忆~（LSTM）。然后，我们开发了一种基于 SGD 的搜索算法，该算法产生基于行或基于图块的丢失模式的分布，以补偿潜在的准确性损失。此外，针对卷积神经网络~（CNN）的训练加速，我们首先探讨了输入特征图的重要性和敏感性；然后提出了灵敏度感知dropout方法，根据输入特征图的灵敏度动态丢弃输入特征图，从而在保留更好的神经网络精度的同时实现更大的前向和后向训练加速。为了促进 DNN 编程，我们构建了一个 DNN 训练计算框架，该框架在软件堆栈中统一了所提出的技术。这样一来，GPGPU只需要支持基本的算子——矩阵乘法，就可以实现无论DNN模型的显着性能提升。

更新日期：2022-03-11

点击分享查看原文

点击收藏

阅读更多本刊最新论文