当前位置: X-MOL 学术arXiv.cs.AR › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
LogicNets: Co-Designed Neural Networks and Circuits for Extreme-Throughput Applications
arXiv - CS - Hardware Architecture Pub Date : 2020-04-06 , DOI: arxiv-2004.03021
Yaman Umuroglu, Yash Akhauri, Nicholas J. Fraser, Michaela Blott

Deployment of deep neural networks for applications that require very high throughput or extremely low latency is a severe computational challenge, further exacerbated by inefficiencies in mapping the computation to hardware. We present a novel method for designing neural network topologies that directly map to a highly efficient FPGA implementation. By exploiting the equivalence of artificial neurons with quantized inputs/outputs and truth tables, we can train quantized neural networks that can be directly converted to a netlist of truth tables, and subsequently deployed as a highly pipelinable, massively parallel FPGA circuit. However, the neural network topology requires careful consideration since the hardware cost of truth tables grows exponentially with neuron fan-in. To obtain smaller networks where the whole netlist can be placed-and-routed onto a single FPGA, we derive a fan-in driven hardware cost model to guide topology design, and combine high sparsity with low-bit activation quantization to limit the neuron fan-in. We evaluate our approach on two tasks with very high intrinsic throughput requirements in high-energy physics and network intrusion detection. We show that the combination of sparsity and low-bit activation quantization results in high-speed circuits with small logic depth and low LUT cost, demonstrating competitive accuracy with less than 15 ns of inference latency and throughput in the hundreds of millions of inferences per second.

中文翻译:

LogicNets:为超大吞吐量应用共同设计的神经网络和电路

为需要非常高吞吐量或极低延迟的应用程序部署深度神经网络是一项严峻的计算挑战,由于将计算映射到硬件的效率低下而进一步加剧。我们提出了一种用于设计直接映射到高效 FPGA 实现的神经网络拓扑的新方法。通过利用具有量化输入/输出和真值表的人工神经元的等效性,我们可以训练量化神经网络,这些网络可以直接转换为真值表网表,然后部署为高度可流水线化、大规模并行的 FPGA 电路。然而,神经网络拓扑需要仔细考虑,因为真值表的硬件成本随着神经元扇入呈指数增长。为了获得可以将整个网表放置并布线到单个 FPGA 上的更小网络,我们推导出一个扇入驱动的硬件成本模型来指导拓扑设计,并结合高稀疏性和低位激活量化来限制神经元扇形-在。我们在高能物理和网络入侵检测中对两个具有非常高的内在吞吐量要求的任务上评估我们的方法。我们展示了稀疏性和低位激活量化的结合产生了具有小逻辑深度和低 LUT 成本的高速电路,在每秒数亿次推理中展示了具有竞争力的准确性,推理延迟和吞吐量小于 15 ns . 并结合高稀疏性和低位激活量化来限制神经元扇入。我们在高能物理和网络入侵检测中对两个具有非常高的内在吞吐量要求的任务上评估我们的方法。我们展示了稀疏性和低位激活量化的结合产生了具有小逻辑深度和低 LUT 成本的高速电路,在每秒数亿次推理中展示了具有竞争力的准确性,推理延迟和吞吐量小于 15 ns . 并结合高稀疏性和低位激活量化来限制神经元扇入。我们在高能物理和网络入侵检测中对两个具有非常高的内在吞吐量要求的任务上评估我们的方法。我们展示了稀疏性和低位激活量化的结合产生了具有小逻辑深度和低 LUT 成本的高速电路,在每秒数亿次推理中展示了具有竞争力的准确性,推理延迟和吞吐量小于 15 ns .
更新日期:2020-04-08
down
wechat
bug