Automatic Mapping of the Best-Suited DNN Pruning Schemes for Real-Time Mobile Acceleration,ACM Transactions on Design Automation of Electronic Systems

当前位置： X-MOL 学术 › ACM Trans. Des. Autom. Electron. Syst. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Automatic Mapping of the Best-Suited DNN Pruning Schemes for Real-Time Mobile Acceleration
ACM Transactions on Design Automation of Electronic Systems ( IF 2.2 ) Pub Date : 2022-06-06 , DOI: 10.1145/3495532
Yifan Gong , Geng Yuan ₁ , Zheng Zhan ₁ , Wei Niu ₂ , Zhengang Li ₁ , Pu Zhao ₁ , Yuxuan Cai ₁ , Sijia Liu ₃ , Bin Ren ₂ , Xue Lin ₁ , Xulong Tang ₄ , Yanzhi Wang ₁

Affiliation

Weight pruning is an effective model compression technique to tackle the challenges of achieving real-time deep neural network (DNN) inference on mobile devices. However, prior pruning schemes have limited application scenarios due to accuracy degradation, difficulty in leveraging hardware acceleration, and/or restriction on certain types of DNN layers. In this article, we propose a general, fine-grained structured pruning scheme and corresponding compiler optimizations that are applicable to any type of DNN layer while achieving high accuracy and hardware inference performance. With the flexibility of applying different pruning schemes to different layers enabled by our compiler optimizations, we further probe into the new problem of determining the best-suited pruning scheme considering the different acceleration and accuracy performance of various pruning schemes. Two pruning scheme mapping methods—one -search based and the other is rule based—are proposed to automatically derive the best-suited pruning regularity and block size for each layer of any given DNN. Experimental results demonstrate that our pruning scheme mapping methods, together with the general fine-grained structured pruning scheme, outperform the state-of-the-art DNN optimization framework with up to 2.48\( \times \) and 1.73\( \times \) DNN inference acceleration on CIFAR-10 and ImageNet datasets without accuracy loss.

中文翻译：

最适合实时移动加速的 DNN 修剪方案的自动映射

权重修剪是一种有效的模型压缩技术，可解决在移动设备上实现实时深度神经网络 (DNN) 推理的挑战。然而，由于精度下降、难以利用硬件加速和/或对某些类型的 DNN 层的限制，先前的剪枝方案具有有限的应用场景。在本文中，我们提出了一种通用的、细粒度的结构化剪枝方案和相应的编译器优化，适用于任何类型的 DNN 层，同时实现高精度和硬件推理性能。通过我们的编译器优化，可以灵活地将不同的修剪方案应用于不同的层，我们进一步探讨了考虑各种剪枝方案的不同加速度和精度性能来确定最适合的剪枝方案的新问题。提出了两种修剪方案映射方法——一种基于搜索，另一种基于规则——为任何给定 DNN 的每一层自动推导出最适合的修剪规律和块大小。实验结果表明，我们的剪枝方案映射方法与一般细粒度结构化剪枝方案一起，优于最先进的 DNN 优化框架，高达 2.48\( \times \) 和 1.73\( \times \ ) 在 CIFAR-10 和 ImageNet 数据集上进行 DNN 推理加速，而不会损失精度。提出了两种修剪方案映射方法——一种基于搜索，另一种基于规则——为任何给定 DNN 的每一层自动推导出最适合的修剪规律和块大小。实验结果表明，我们的剪枝方案映射方法与一般细粒度结构化剪枝方案一起，优于最先进的 DNN 优化框架，高达 2.48\( \times \) 和 1.73\( \times \ ) 在 CIFAR-10 和 ImageNet 数据集上进行 DNN 推理加速，而不会损失精度。提出了两种修剪方案映射方法——一种基于搜索，另一种基于规则——为任何给定 DNN 的每一层自动推导出最适合的修剪规律和块大小。实验结果表明，我们的剪枝方案映射方法与一般细粒度结构化剪枝方案一起，优于最先进的 DNN 优化框架，高达 2.48\( \times \) 和 1.73\( \times \ ) 在 CIFAR-10 和 ImageNet 数据集上进行 DNN 推理加速，而不会损失精度。

更新日期：2022-06-06

点击分享查看原文

点击收藏

公开下载

阅读更多本刊最新论文本刊介绍/投稿指南11