当前位置: X-MOL 学术IEEE Trans. Comput. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
LUTNet: Learning FPGA Configurations for Highly Efficient Neural Network Inference
IEEE Transactions on Computers ( IF 3.6 ) Pub Date : 2020-12-01 , DOI: 10.1109/tc.2020.2978817
Erwei Wang , James J. Davis , Peter Y. K. Cheung , George A. Constantinides

Research has shown that deep neural networks contain significant redundancy, and thus that high classification accuracy can be achieved even when weights and activations are quantized down to binary values. Network binarization on FPGAs greatly increases area efficiency by replacing resource-hungry multipliers with lightweight XNOR gates. However, an FPGA's fundamental building block, the $K$K-LUT, is capable of implementing far more than an XNOR: it can perform any $K$K-input Boolean operation. Inspired by this observation, we propose LUTNet, an end-to-end hardware-software framework for the construction of area-efficient FPGA-based neural network accelerators using the native LUTs as inference operators. We describe the realization of both unrolled and tiled LUTNet architectures, with the latter facilitating smaller, less power-hungry deployment over the former while sacrificing area and energy efficiency along with throughput. For both varieties, we demonstrate that the exploitation of LUT flexibility allows for far heavier pruning than possible in prior works, resulting in significant area savings while achieving comparable accuracy. Against the state-of-the-art binarized neural network implementation, we achieve up to twice the area efficiency for several standard network models when inferencing popular datasets. We also demonstrate that even greater energy efficiency improvements are obtainable.

中文翻译:

LUTNet:学习用于高效神经网络推理的 FPGA 配置

研究表明,深度神经网络包含显着的冗余,因此即使权重和激活被量化为二进制值,也可以实现高分类精度。FPGA 上的网络二值化通过用轻量级 XNOR 门替换资源匮乏的乘法器,大大提高了面积效率。然而,FPGA 的基本构建块,$K$-LUT,能够实现的远不止 XNOR:它可以执行任何 $K$-输入布尔运算。受这一观察的启发,我们提出了 LUTNet,这是一种端到端的软硬件框架,用于使用原生 LUT 作为推理算子构建面积高效的基于 FPGA 的神经网络加速器。我们描述了展开和平铺 LUTNet 架构的实现,后者促进了更小、耗电更少的部署,同时牺牲了面积和能源效率以及吞吐量。对于这两个品种,我们证明了 LUT 灵活性的利用允许比以前的工作更重的修剪,从而在实现可比精度的同时显着节省面积。针对最先进的二值化神经网络实现,在推理流行数据集时,我们实现了几个标准网络模型的面积效率的两倍。我们还证明可以获得更大的能源效率改进。
更新日期:2020-12-01
down
wechat
bug