当前位置: X-MOL 学术J. Real-Time Image Proc. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
A dedicated hardware accelerator for real-time acceleration of YOLOv2
Journal of Real-Time Image Processing ( IF 3 ) Pub Date : 2020-05-09 , DOI: 10.1007/s11554-020-00977-w
Ke Xu , Xiaoyun Wang , Xinyang Liu , Changfeng Cao , Huolin Li , Haiyong Peng , Dong Wang

In recent years, dedicated hardware accelerators for the acceleration of the convolutional neural network (CNN) have been extensively studied. Although many studies have presented efficient designs on FPGAs for image classification neural network models such as AlexNet and VGG, there are still little implementations for CNN-based object detection applications. This paper presents an OpenCL-based high-throughput FPGA accelerator for the YOLOv2 object detection algorithm on Arria-10 GX1150 FPGA. The proposed hardware architecture adopts a scalable pipeline design to support multi-resolution input image and full 8-bit fixed-point datapath to improve hardware resource utilization. Layer fusion technology that merges the convolution, batch normalization and Leaky-ReLU is also developed to avoid transmission of intermediate data between FPGA and external memory. Experimental results show that the final design achieves a peak throughput of 566 GOP/s under the working frequency of 190 MHz. The accelerator can execute YOLOv2 inference computation (\(288\times 288\) resolution) and tiny YOLOv2 (\(416\times 416\) resolution) at the speed of 35 and 71 FPS, respectively.



中文翻译:

专用硬件加速器,用于YOLOv2的实时加速

近年来,已经广泛研究了用于加速卷积神经网络(CNN)的专用硬件加速器。尽管许多研究已经针对图像分类神经网络模型(例如AlexNet和VGG)在FPGA上提出了有效的设计,但是对于基于CNN的目标检测应用仍然很少实现。本文针对Arria-10 GX1150 FPGA上的YOLOv2对象检测算法,提出了一种基于OpenCL的高吞吐量FPGA加速器。提出的硬件体系结构采用可扩展的流水线设计来支持多分辨率输入图像和完整的8位定点数据路径,以提高硬件资源利用率。融合卷积的层融合技术,还开发了批量标准化和Leaky-ReLU,以避免在FPGA和外部存储器之间传输中间数据。实验结果表明,最终设计在190 MHz的工作频率下实现了566 GOP / s的峰值吞吐量。加速器可以执行YOLOv2推理计算(\(288 × 288 \)分辨率和微型YOLOv2(\(416 \ 416分辨率))分别以35 FPS和71 FPS的速度运行。

更新日期:2020-05-09
down
wechat
bug