当前位置: X-MOL 学术Softw. Pract. Exp. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Field programmable gate array-based all-layer accelerator with quantization neural networks for sustainable cyber-physical systems
Software: Practice and Experience ( IF 3.5 ) Pub Date : 2020-12-03 , DOI: 10.1002/spe.2938
Jinyu Zhan 1 , Xingzhi Zhou 1 , Wei Jiang 1
Affiliation  

Low-Bit Neural Network (LBNN) is a promising technique to enrich intelligent applications running on sustainable Cyber-Physical Systems (CPS). Although LBNN has the advantages of low memory usage, fast inference and low power consumption, Low-bit design requires additional computation units and may cause large accuracy drop. In this paper, we approach to design Field Programmable Gate Array (FPGA)-based LBNN accelerator to support sustainable CPS. First, we propose a method to quantize the neural networks into 2-bit weights, 8-bit activations and 8-bit biases with few accuracy loss. The mapping function is presented to approximate discrete space of weights gradually and quantize the activations and biases through the improved straight-through estimator. Second, we design the bitwise FPGA-based accelerator to speed up the LBNN. Different from traditional accelerating techniques (mainly focused on convolution layer), the dataflows of fully connected layer, pooling layer and convolution layer are considered to accelerate all layers of neural networks. The 2×8 bitwise multiplier implemented by AND/XOR operation is devised to replace 32×32-bit multiplication unit, which can bring faster inference and lower power consumption. We conduct extensive experiments on benchmarks of MNIST, CIFAR-10, CIFAR-100 and ImageNet to evaluate the efficiency of our approach. The LBNN obtained by our quantization method can save 93.75% memory with 2.26% accuracy loss on average compared with original networks. The FPGA-based accelerator achieves a peak performance of 427.71 GOPS under 100 MHz working frequency, which outperforms previous approaches significantly.

中文翻译:

用于可持续信息物理系统的具有量化神经网络的基于现场可编程门阵列的全层加速器

低位神经网络 (LBNN) 是一种很有前途的技术,可以丰富在可持续网络物理系统 (CPS) 上运行的智能应用程序。虽然LBNN具有内存占用低、推理速度快、功耗低等优点,但Low-bit设计需要额外的计算单元,可能导致精度下降较大。在本文中,我们设计基于现场可编程门阵列 (FPGA) 的 LBNN 加速器以支持可持续的 CPS。首先,我们提出了一种将神经网络量化为 2 位权重、8 位激活和 8 位偏差的方法,而几乎没有精度损失。映射函数被提出来逐渐逼近权重的离散空间,并通过改进的直通估计器量化激活和偏差。其次,我们设计了基于 FPGA 的按位加速器来加速 LBNN。与传统的加速技术(主要集中在卷积层)不同,全连接层、池化层和卷积层的数据流被认为是加速神经网络的所有层。设计了由AND/XOR运算实现的2×8位乘法器来代替32×32位乘法单元,可以带来更快的推理和更低的功耗。我们对 MNIST、CIFAR-10、CIFAR-100 和 ImageNet 的基准进行了大量实验,以评估我们方法的效率。与原始网络相比,通过我们的量化方法获得的 LBNN 可以节省 93.75% 的内存,平均精度损失为 2.26%。基于 FPGA 的加速器在 100 MHz 工作频率下实现了 427.71 GOPS 的峰值性能,显着优于以前的方法。
更新日期:2020-12-03
down
wechat
bug