当前位置: X-MOL 学术IEEE Micro › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
High-Performance Mixed-Low-Precision CNN Inference Accelerator on FPGA
IEEE Micro ( IF 2.8 ) Pub Date : 2021-05-19 , DOI: 10.1109/mm.2021.3081735
Junbin Wang 1 , Shaoxia Fang 1 , Xi Wang 1 , Jiangsha Ma 1 , Taobo Wang 1 , Yi Shan 1
Affiliation  

Low-precision techniques can effectively reduce the computational complexity and bandwidth requirements of a convolutional neural network (CNN) inference, but may lead to significant accuracy degradation. Mixed-low-precision techniques provide a superior approach for CNN inference since it can take the advantages of low precision while maintaining accuracy. In this article, we propose a high-performance, highly flexible ${W^8A^8}$W8A8 (INT8 weight and INT8 activation) and ${W^T A^2}$WTA2 (TERNARY weight and INT2 activation) mixed-precision CNN inference hardware architecture, DPUmxp, designed and implemented on Xilinx Virtex UltraScale+13P FPGA with peak performance up to 58.9 TOPS.

中文翻译:


FPGA 上的高性能混合低精度 CNN 推理加速器



低精度技术可以有效降低卷积神经网络(CNN)推理的计算复杂性和带宽要求,但可能会导致精度显着下降。混合低精度技术为 CNN 推理提供了一种优越的方法,因为它可以在保持准确性的同时利用低精度的优势。在本文中,我们提出了一种高性能、高度灵活的${W^8A^8}$W8A8(INT8权重和INT8激活)和${W^TA^2}$WTA2(TERNARY权重和INT2激活)混合-精密CNN推理硬件架构DPUmxp,在Xilinx Virtex UltraScale+13P FPGA上设计和实现,峰值性能高达58.9 TOPS。
更新日期:2021-05-19
down
wechat
bug