当前位置: X-MOL 学术IEEE Trans. Very Larg. Scale Integr. Syst. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Analysis of a Pipelined Architecture for Sparse DNNs on Embedded Systems
IEEE Transactions on Very Large Scale Integration (VLSI) Systems ( IF 2.8 ) Pub Date : 2020-07-08 , DOI: 10.1109/tvlsi.2020.3005451
Adrian Alcolea Moreno , Javier Olivito , Javier Resano , Hortensia Mecha

Deep neural networks (DNNs) are increasing their presence in a wide range of applications, and their computationally intensive and memory-demanding nature poses challenges, especially for embedded systems. Pruning techniques turn DNN models into sparse by setting most weights to zero, offering optimization opportunities if specific support is included. We propose a novel pipelined architecture for DNNs that avoids all useless operations during the inference process. It has been implemented in a field-programmable gate array (FPGA), and the performance, energy efficiency, and area have been characterized. Exploiting sparsity yields remarkable speedups but also produces area overheads. We have evaluated this tradeoff in order to identify in which scenarios it is better to use that area to exploit sparsity, or to include more computational resources in a conventional DNN architecture. We have also explored different arithmetic bitwidths. Our sparse architecture is clearly superior on 32-bit arithmetic or highly sparse networks. However, on 8-bit arithmetic or networks with low sparsity it is more profitable to deploy a dense architecture with more arithmetic resources than including support for sparsity. We consider that FPGAs are the natural target for DNN sparse accelerators since they can be loaded at run-time with the best-fitting accelerator.

中文翻译:


嵌入式系统上稀疏 DNN 的流水线架构分析



深度神经网络 (DNN) 在各种应用中的应用不断增加,其计算密集型和内存需求的性质带来了挑战,特别是对于嵌入式系统。剪枝技术通过将大多数权重设置为零,将 DNN 模型变成稀疏模型,如果包含特定支持,则提供优化机会。我们提出了一种新颖的 DNN 流水线架构,可以避免推理过程中所有无用的操作。它已在现场可编程门阵列(FPGA)中实现,并对其性能、能效和面积进行了表征。利用稀疏性可以显着提高速度,但也会产生面积开销。我们评估了这种权衡,以确定在哪些场景中最好使用该区域来利用稀疏性,或者在传统的 DNN 架构中包含更多的计算资源。我们还探索了不同的算术位宽。我们的稀疏架构在 32 位算术或高度稀疏网络上显然更胜一筹。然而,在 8 位算术或低稀疏性网络上,部署具有更多算术资源的密集架构比支持稀疏性更有利可图。我们认为 FPGA 是 DNN 稀疏加速器的自然目标,因为它们可以在运行时使用最适合的加速器加载。
更新日期:2020-07-08
down
wechat
bug