当前位置:
X-MOL 学术
›
arXiv.cs.NE
›
论文详情
Our official English website, www.x-mol.net, welcomes your
feedback! (Note: you will need to create a separate account there.)
Non-Structured DNN Weight Pruning -- Is It Beneficial in Any Platform?
arXiv - CS - Neural and Evolutionary Computing Pub Date : 2019-07-03 , DOI: arxiv-1907.02124 Xiaolong Ma, Sheng Lin, Shaokai Ye, Zhezhi He, Linfeng Zhang, Geng Yuan, Sia Huat Tan, Zhengang Li, Deliang Fan, Xuehai Qian, Xue Lin, Kaisheng Ma, Yanzhi Wang
arXiv - CS - Neural and Evolutionary Computing Pub Date : 2019-07-03 , DOI: arxiv-1907.02124 Xiaolong Ma, Sheng Lin, Shaokai Ye, Zhezhi He, Linfeng Zhang, Geng Yuan, Sia Huat Tan, Zhengang Li, Deliang Fan, Xuehai Qian, Xue Lin, Kaisheng Ma, Yanzhi Wang
Large deep neural network (DNN) models pose the key challenge to energy
efficiency due to the significantly higher energy consumption of off-chip DRAM
accesses than arithmetic or SRAM operations. It motivates the intensive
research on model compression with two main approaches. Weight pruning
leverages the redundancy in the number of weights and can be performed in a
non-structured, which has higher flexibility and pruning rate but incurs index
accesses due to irregular weights, or structured manner, which preserves the
full matrix structure with lower pruning rate. Weight quantization leverages
the redundancy in the number of bits in weights. Compared to pruning,
quantization is much more hardware-friendly, and has become a "must-do" step
for FPGA and ASIC implementations. This paper provides a definitive answer to
the question for the first time. First, we build ADMM-NN-S by extending and
enhancing ADMM-NN, a recently proposed joint weight pruning and quantization
framework. Second, we develop a methodology for fair and fundamental comparison
of non-structured and structured pruning in terms of both storage and
computation efficiency. Our results show that ADMM-NN-S consistently
outperforms the prior art: (i) it achieves 348x, 36x, and 8x overall weight
pruning on LeNet-5, AlexNet, and ResNet-50, respectively, with (almost) zero
accuracy loss; (ii) we demonstrate the first fully binarized (for all layers)
DNNs can be lossless in accuracy in many cases. These results provide a strong
baseline and credibility of our study. Based on the proposed comparison
framework, with the same accuracy and quantization, the results show that
non-structrued pruning is not competitive in terms of both storage and
computation efficiency. Thus, we conclude that non-structured pruning is
considered harmful. We urge the community not to continue the DNN inference
acceleration for non-structured sparsity.
中文翻译:
非结构化 DNN 权重修剪——它在任何平台上都有益吗?
由于片外 DRAM 访问的能耗明显高于算术或 SRAM 操作,因此大型深度神经网络 (DNN) 模型对能效提出了关键挑战。它通过两种主要方法激发了对模型压缩的深入研究。权重剪枝利用权重数量的冗余,可以在非结构化中进行,具有更高的灵活性和剪枝率,但由于权重不规则会导致索引访问,或者结构化方式,保留完整矩阵结构,剪枝率较低. 权重量化利用权重中位数的冗余。与修剪相比,量化对硬件更加友好,并且已成为 FPGA 和 ASIC 实现的“必须做的”步骤。这篇论文首次对这个问题给出了明确的答案。首先,我们通过扩展和增强 ADMM-NN(一种最近提出的联合权重修剪和量化框架)来构建 ADMM-NN-S。其次,我们开发了一种在存储和计算效率方面对非结构化和结构化修剪进行公平和基本比较的方法。我们的结果表明 ADMM-NN-S 始终优于现有技术:(i)它分别在 LeNet-5、AlexNet 和 ResNet-50 上实现了 348 倍、36 倍和 8 倍的整体权重修剪,并且(几乎)零精度损失; (ii) 我们展示了第一个完全二值化的(对于所有层)DNN 在许多情况下在准确性上是无损的。这些结果为我们的研究提供了强大的基线和可信度。基于提出的比较框架,具有相同的精度和量化,结果表明,非结构化剪枝在存储和计算效率方面都没有竞争力。因此,我们得出结论,非结构化修剪被认为是有害的。我们敦促社区不要继续对非结构化稀疏性进行 DNN 推理加速。
更新日期:2020-01-09
中文翻译:
非结构化 DNN 权重修剪——它在任何平台上都有益吗?
由于片外 DRAM 访问的能耗明显高于算术或 SRAM 操作,因此大型深度神经网络 (DNN) 模型对能效提出了关键挑战。它通过两种主要方法激发了对模型压缩的深入研究。权重剪枝利用权重数量的冗余,可以在非结构化中进行,具有更高的灵活性和剪枝率,但由于权重不规则会导致索引访问,或者结构化方式,保留完整矩阵结构,剪枝率较低. 权重量化利用权重中位数的冗余。与修剪相比,量化对硬件更加友好,并且已成为 FPGA 和 ASIC 实现的“必须做的”步骤。这篇论文首次对这个问题给出了明确的答案。首先,我们通过扩展和增强 ADMM-NN(一种最近提出的联合权重修剪和量化框架)来构建 ADMM-NN-S。其次,我们开发了一种在存储和计算效率方面对非结构化和结构化修剪进行公平和基本比较的方法。我们的结果表明 ADMM-NN-S 始终优于现有技术:(i)它分别在 LeNet-5、AlexNet 和 ResNet-50 上实现了 348 倍、36 倍和 8 倍的整体权重修剪,并且(几乎)零精度损失; (ii) 我们展示了第一个完全二值化的(对于所有层)DNN 在许多情况下在准确性上是无损的。这些结果为我们的研究提供了强大的基线和可信度。基于提出的比较框架,具有相同的精度和量化,结果表明,非结构化剪枝在存储和计算效率方面都没有竞争力。因此,我们得出结论,非结构化修剪被认为是有害的。我们敦促社区不要继续对非结构化稀疏性进行 DNN 推理加速。