当前位置: X-MOL 学术IEEE Trans. Circuits Syst. I Regul. Pap. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
A Real-Time Architecture for Pruning the Effectual Computations in Deep Neural Networks
IEEE Transactions on Circuits and Systems I: Regular Papers ( IF 5.1 ) Pub Date : 2021-03-02 , DOI: 10.1109/tcsi.2021.3060945
Mohammadreza Asadikouhanjani , Hao Zhang , Lakshminarayanan Gopalakrishnan , Hyuk-Jae Lee , Seok-Bum Ko

Integrating Deep Neural Networks (DNNs) into the Internet of Thing (IoT) devices could result in the emergence of complex sensing and recognition tasks that support a new era of human interactions with surrounding environments. However, DNNs are power-hungry, performing billions of computations in terms of one inference. Spatial DNN accelerators in principle can support computation-pruning techniques compared to other common architectures such as systolic arrays. Energy-efficient DNN accelerators skip bit-wise or word-wise sparsity in the input feature maps (ifmaps) and filter weights which means ineffectual computations are skipped. However, there is still room for pruning the effectual computations without reducing the accuracy of DNNs. In this paper, we propose a novel real-time architecture and dataflow by decomposing multiplications down to the bit level and pruning identical computations in spatial designs while running benchmark networks. The proposed architecture prunes identical computations by identifying identical bit values available in both ifmaps and filter weights without changing the accuracy of benchmark networks. When compared to the reference design, our proposed design achieves an average per layer speedup of $\times 1.4$ and an energy efficiency of $\times 1.21$ per inference while maintaining the accuracy of benchmark networks.

中文翻译:

修剪深层神经网络中有效计算的实时体系结构

将深度神经网络(DNN)集成到物联网(IoT)设备中可能导致出现复杂的传感和识别任务,从而支持人类与周围环境互动的新时代。但是,DNN非常耗电,仅凭一次推理就执行数十亿次计算。与其他常见体系结构(例如,脉动阵列)相比,空间DNN加速器原则上可以支持计算修剪技术。高效的DNN加速器会跳过输入特征图(ifmap)和过滤器权重中的按位或按词的稀疏性,这意味着可以跳过无效的计算。但是,在不降低DNN准确性的情况下,仍有修剪有效计算的空间。在本文中,我们通过将乘法分解到位级别并在运行基准网络时修剪空间设计中的相同计算,提出了一种新颖的实时体系结构和数据流。所提出的体系结构通过识别ifmap和滤波器权重中可用的相同位值来修剪相同的计算,而不会更改基准网络的准确性。与参考设计相比,我们提出的设计实现了平均每层加速 $ \次1.4 $ 能源效率为 $ \次1.21 $ 每次推论,同时保持基准网络的准确性。
更新日期:2021-04-20
down
wechat
bug