当前位置: X-MOL 学术Proc. IEEE › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Model Compression and Hardware Acceleration for Neural Networks: A Comprehensive Survey
Proceedings of the IEEE ( IF 20.6 ) Pub Date : 2020-04-01 , DOI: 10.1109/jproc.2020.2976475
By Lei Deng , Guoqi Li , Song Han , Luping Shi , Yuan Xie

Domain-specific hardware is becoming a promising topic in the backdrop of improvement slow down for general-purpose processors due to the foreseeable end of Moore’s Law. Machine learning, especially deep neural networks (DNNs), has become the most dazzling domain witnessing successful applications in a wide spectrum of artificial intelligence (AI) tasks. The incomparable accuracy of DNNs is achieved by paying the cost of hungry memory consumption and high computational complexity, which greatly impedes their deployment in embedded systems. Therefore, the DNN compression concept was naturally proposed and widely used for memory saving and compute acceleration. In the past few years, a tremendous number of compression techniques have sprung up to pursue a satisfactory tradeoff between processing efficiency and application accuracy. Recently, this wave has spread to the design of neural network accelerators for gaining extremely high performance. However, the amount of related works is incredibly huge and the reported approaches are quite divergent. This research chaos motivates us to provide a comprehensive survey on the recent advances toward the goal of efficient compression and execution of DNNs without significantly compromising accuracy, involving both the high-level algorithms and their applications in hardware design. In this article, we review the mainstream compression approaches such as compact model, tensor decomposition, data quantization, and network sparsification. We explain their compression principles, evaluation metrics, sensitivity analysis, and joint-way use. Then, we answer the question of how to leverage these methods in the design of neural network accelerators and present the state-of-the-art hardware architectures. In the end, we discuss several existing issues such as fair comparison, testing workloads, automatic compression, influence on security, and framework/hardware-level support, and give promising topics in this field and the possible challenges as well. This article attempts to enable readers to quickly build up a big picture of neural network compression and acceleration, clearly evaluate various methods, and confidently get started in the right way.

中文翻译:

神经网络的模型压缩和硬件加速:综合调查

由于摩尔定律可预见的终结,在通用处理器的改进放缓的背景下,特定领域的硬件正在成为一个有前途的话题。机器学习,尤其是深度神经网络 (DNN),已成为见证广泛人工智能 (AI) 任务中成功应用的最耀眼的领域。DNN 无与伦比的准确性是通过付出巨大的内存消耗和高计算复杂度的代价来实现的,这极大地阻碍了它们在嵌入式系统中的部署。因此,DNN 压缩概念自然而然地被提出并广泛用于节省内存和计算加速。在过去的几年里,大量的压缩技术如雨后春笋般涌现,以在处理效率和应用准确性之间寻求令人满意的权衡。最近,这一浪潮已经蔓延到神经网络加速器的设计中,以获得极高的性能。然而,相关工作的数量非常庞大,报道的方法也大相径庭。这种研究混乱促使我们对 DNN 的有效压缩和执行目标的最新进展进行全面调查,同时又不显着影响准确性,涉及高级算法及其在硬件设计中的应用。在本文中,我们回顾了压缩模型、张量分解、数据量化和网络稀疏化等主流压缩方法。我们解释了它们的压缩原理、评估指标、敏感性分析和联合使用。然后,我们回答了如何在神经网络加速器的设计中利用这些方法的问题,并展示了最先进的硬件架构。最后,我们讨论了公平比较、测试工作负载、自动压缩、对安全性的影响以及框架/硬件级支持等几个现有问题,并给出了该领域有前景的主题和可能面临的挑战。本文试图让读者快速构建神经网络压缩和加速的大图,清晰评估各种方法,自信地以正确的方式上手。并给出该领域有前景的主题以及可能面临的挑战。本文试图让读者快速构建神经网络压缩和加速的大图,清晰评估各种方法,自信地以正确的方式上手。并给出该领域有前景的主题以及可能面临的挑战。本文试图让读者快速构建神经网络压缩和加速的大图,清晰评估各种方法,自信地以正确的方式上手。
更新日期:2020-04-01
down
wechat
bug