当前位置: X-MOL 学术Mach. Learn. Sci. Technol. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Compressing deep neural networks on FPGAs to binary and ternary precision with hls4ml
Machine Learning: Science and Technology ( IF 6.013 ) Pub Date : 2020-12-04 , DOI: 10.1088/2632-2153/aba042
Jennifer Ngadiuba 1 , Vladimir Loncar 1 , Maurizio Pierini 1 , Sioni Summers 1 , Giuseppe Di Guglielmo 2 , Javier Duarte 3 , Philip Harris 4 , Dylan Rankin 4 , Sergo Jindariani 5 , Mia Liu 5 , Kevin Pedro 5 , Nhan Tran 5 , Edward Kreinar 6 , Sheila Sagear 7 , Zhenbin Wu 8 , Duc Hoang 9
Affiliation  

We present the implementation of binary and ternary neural networks in the hls4ml library, designed to automatically convert deep neural network models to digital circuits with field-programmable gate arrays (FPGA) firmware. Starting from benchmark models trained with floating point precision, we investigate different strategies to reduce the network’s resource consumption by reducing the numerical precision of the network parameters to binary or ternary. We discuss the trade-off between model accuracy and resource consumption. In addition, we show how to balance between latency and accuracy by retaining full precision on a selected subset of network components. As an example, we consider two multiclass classification tasks: handwritten digit recognition with the MNIST data set and jet identification with simulated proton-proton collisions at the CERN Large Hadron Collider. The binary and ternary implementation has similar performance to the higher precision implementation while using drastically fewer FPGA resources.



中文翻译:

使用hls4ml将FPGA上的深度神经网络压缩到二进制和三进制精度

我们在hls4ml中介绍二元和三元神经网络的实现库,旨在通过现场可编程门阵列(FPGA)固件自动将深层神经网络模型转换为数字电路。从以浮点精度训练的基准模型开始,我们研究了通过将网络参数的数值精度降低为二进制或三进制来减少网络资源消耗的不同策略。我们讨论了模型准确性和资源消耗之间的权衡。此外,我们展示了如何通过在选定的网络组件子集上保持全精度来平衡延迟和精度。例如,我们考虑两个多类分类任务:使用MNIST数据集进行手写数字识别,以及在CERN大型强子对撞机上通过模拟质子-质子碰撞进行射流识别。

更新日期:2020-12-04
down
wechat
bug