Learning N:M Fine-grained Structured Sparse Neural Networks From Scratch,arXiv - CS - Hardware Architecture

当前位置： X-MOL 学术 › arXiv.cs.AR › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Learning N:M Fine-grained Structured Sparse Neural Networks From Scratch
arXiv - CS - Hardware Architecture Pub Date : 2021-02-08 , DOI: arxiv-2102.04010
Aojun Zhou, Yukun Ma, Junnan Zhu, Jianbo Liu, Zhijie Zhang, Kun Yuan, Wenxiu Sun, Hongsheng Li

Sparsity in Deep Neural Networks (DNNs) has been widely studied to compress and accelerate the models on resource-constrained environments. It can be generally categorized into unstructured fine-grained sparsity that zeroes out multiple individual weights distributed across the neural network, and structured coarse-grained sparsity which prunes blocks of sub-networks of a neural network. Fine-grained sparsity can achieve a high compression ratio but is not hardware friendly and hence receives limited speed gains. On the other hand, coarse-grained sparsity cannot concurrently achieve both apparent acceleration on modern GPUs and decent performance. In this paper, we are the first to study training from scratch an N:M fine-grained structured sparse network, which can maintain the advantages of both unstructured fine-grained sparsity and structured coarse-grained sparsity simultaneously on specifically designed GPUs. Specifically, a 2:4 sparse network could achieve 2x speed-up without performance drop on Nvidia A100 GPUs. Furthermore, we propose a novel and effective ingredient, sparse-refined straight-through estimator (SR-STE), to alleviate the negative influence of the approximated gradients computed by vanilla STE during optimization. We also define a metric, Sparse Architecture Divergence (SAD), to measure the sparse network's topology change during the training process. Finally, We justify SR-STE's advantages with SAD and demonstrate the effectiveness of SR-STE by performing comprehensive experiments on various tasks. Source codes and models are available at https://github.com/NM-sparsity/NM-sparsity.

中文翻译：

从头开始学习N：M细粒度结构化稀疏神经网络

深度神经网络（DNN）中的稀疏性已被广泛研究以压缩和加速资源受限环境中的模型。通常可以将其归类为将整个神经网络中分布的多个单个权重归零的非结构化细粒度稀疏性，以及修剪神经网络的子网络块的结构化粗粒度稀疏性。细粒度的稀疏性可以实现较高的压缩率，但对硬件不友好，因此速度增益有限。另一方面，粗粒度的稀疏性不能同时实现现代GPU上的明显加速和出色的性能。在本文中，我们是第一个从头研究N：M细粒度结构化稀疏网络训练的人，它可以在专门设计的GPU上同时保持非结构化细粒度稀疏性和结构化粗粒度稀疏性的优势。具体来说，在Nvidia A100 GPU上，2：4稀疏网络可以实现2倍的加速，而性能不会下降。此外，我们提出了一种新颖有效的成分，即稀疏精简直通估算器（SR-STE），以减轻优化过程中香草STE计算出的近似梯度的负面影响。我们还定义了一个度量标准，即稀疏体系结构发散度（SAD），以测量训练过程中稀疏网络的拓扑变化。最后，我们通过SAD证明SR-STE的优势，并通过对各种任务进行全面的实验来证明SR-STE的有效性。源代码和模型可在https：// github上获得。

更新日期：2021-02-09

点击分享查看原文

点击收藏

阅读更多本刊最新论文