A fast and scalable architecture to run convolutional neural networks in low density FPGAs,Microprocessors and Microsystems

当前位置： X-MOL 学术 › Microprocess. Microsyst. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

A fast and scalable architecture to run convolutional neural networks in low density FPGAs
Microprocessors and Microsystems ( IF 2.6 ) Pub Date : 2020-05-21 , DOI: 10.1016/j.micpro.2020.103136
Mário P. Véstias , Rui P. Duarte , José T. de Sousa , Horácio C. Neto

Deep learning and, in particular, convolutional neural networks (CNN) achieve very good results on several computer vision applications like security and surveillance, where image and video analysis are required. These networks are quite demanding in terms of computation and memory and therefore are usually implemented in high-performance computing platforms or devices. Running CNNs in embedded platforms or devices with low computational and memory resources requires a careful optimization of system architectures and algorithms to obtain very efficient designs. In this context, Field Programmable Gate Arrays (FPGA) can achieve this efficiency since the programmable hardware fabric can be tailored for each specific network. In this paper, a very efficient configurable architecture for CNN inference targeting any density FPGAs is described. The architecture considers fixed-point arithmetic and image batch to reduce computational, memory and memory bandwidth requirements without compromising network accuracy. The developed architecture supports the execution of large CNNs in any FPGA devices including those with small on-chip memory size and logic resources. With the proposed architecture, it is possible to infer an image in AlexNet in 4.3 ms in a ZYNQ7020 and 1.2 ms in a ZYNQ7045.

中文翻译：

在低密度FPGA中运行卷积神经网络的快速且可扩展的架构

深度学习，特别是卷积神经网络（CNN）在需要图像和视频分析的几种计算机视觉应用程序（例如安全和监视）上取得了很好的结果。这些网络在计算和内存方面要求很高，因此通常在高性能计算平台或设备中实现。在具有低计算和内存资源的嵌入式平台或设备中运行CNN要求仔细优化系统架构和算法，以获得非常有效的设计。在这种情况下，现场可编程门阵列（FPGA）可以达到这种效率，因为可以为每个特定网络量身定制可编程硬件结构。在本文中，描述了一种针对任何密度FPGA的CNN推理非常有效的可配置架构。该体系结构考虑了定点算法和图像批处理，以减少计算，内存和内存带宽的需求，而不会影响网络精度。开发的架构支持在任何FPGA器件中执行大型CNN，包括那些具有较小片上存储器大小和逻辑资源的CNN。利用提出的体系结构，可以在ZYNQ7020中的4.3毫秒和ZYNQ7045中的1.2毫秒内推断AlexNet中的图像。

更新日期：2020-05-21

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南

全部期刊列表>>