Non-Structured DNN Weight Pruning -- Is It Beneficial in Any Platform?,arXiv - CS - Neural and Evolutionary Computing

当前位置： X-MOL 学术 › arXiv.cs.NE › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Non-Structured DNN Weight Pruning -- Is It Beneficial in Any Platform?
arXiv - CS - Neural and Evolutionary Computing Pub Date : 2019-07-03 , DOI: arxiv-1907.02124
Xiaolong Ma, Sheng Lin, Shaokai Ye, Zhezhi He, Linfeng Zhang, Geng Yuan, Sia Huat Tan, Zhengang Li, Deliang Fan, Xuehai Qian, Xue Lin, Kaisheng Ma, Yanzhi Wang

Large deep neural network (DNN) models pose the key challenge to energy efficiency due to the significantly higher energy consumption of off-chip DRAM accesses than arithmetic or SRAM operations. It motivates the intensive research on model compression with two main approaches. Weight pruning leverages the redundancy in the number of weights and can be performed in a non-structured, which has higher flexibility and pruning rate but incurs index accesses due to irregular weights, or structured manner, which preserves the full matrix structure with lower pruning rate. Weight quantization leverages the redundancy in the number of bits in weights. Compared to pruning, quantization is much more hardware-friendly, and has become a "must-do" step for FPGA and ASIC implementations. This paper provides a definitive answer to the question for the first time. First, we build ADMM-NN-S by extending and enhancing ADMM-NN, a recently proposed joint weight pruning and quantization framework. Second, we develop a methodology for fair and fundamental comparison of non-structured and structured pruning in terms of both storage and computation efficiency. Our results show that ADMM-NN-S consistently outperforms the prior art: (i) it achieves 348x, 36x, and 8x overall weight pruning on LeNet-5, AlexNet, and ResNet-50, respectively, with (almost) zero accuracy loss; (ii) we demonstrate the first fully binarized (for all layers) DNNs can be lossless in accuracy in many cases. These results provide a strong baseline and credibility of our study. Based on the proposed comparison framework, with the same accuracy and quantization, the results show that non-structrued pruning is not competitive in terms of both storage and computation efficiency. Thus, we conclude that non-structured pruning is considered harmful. We urge the community not to continue the DNN inference acceleration for non-structured sparsity.

中文翻译：

非结构化 DNN 权重修剪——它在任何平台上都有益吗？

由于片外 DRAM 访问的能耗明显高于算术或 SRAM 操作，因此大型深度神经网络 (DNN) 模型对能效提出了关键挑战。它通过两种主要方法激发了对模型压缩的深入研究。权重剪枝利用权重数量的冗余，可以在非结构化中进行，具有更高的灵活性和剪枝率，但由于权重不规则会导致索引访问，或者结构化方式，保留完整矩阵结构，剪枝率较低. 权重量化利用权重中位数的冗余。与修剪相比，量化对硬件更加友好，并且已成为 FPGA 和 ASIC 实现的“必须做的”步骤。这篇论文首次对这个问题给出了明确的答案。首先，我们通过扩展和增强 ADMM-NN（一种最近提出的联合权重修剪和量化框架）来构建 ADMM-NN-S。其次，我们开发了一种在存储和计算效率方面对非结构化和结构化修剪进行公平和基本比较的方法。我们的结果表明 ADMM-NN-S 始终优于现有技术：（i）它分别在 LeNet-5、AlexNet 和 ResNet-50 上实现了 348 倍、36 倍和 8 倍的整体权重修剪，并且（几乎）零精度损失; (ii) 我们展示了第一个完全二值化的（对于所有层）DNN 在许多情况下在准确性上是无损的。这些结果为我们的研究提供了强大的基线和可信度。基于提出的比较框架，具有相同的精度和量化，结果表明，非结构化剪枝在存储和计算效率方面都没有竞争力。因此，我们得出结论，非结构化修剪被认为是有害的。我们敦促社区不要继续对非结构化稀疏性进行 DNN 推理加速。

更新日期：2020-01-09

点击分享查看原文

点击收藏

阅读更多本刊最新论文