当前位置: X-MOL 学术J. Sign. Process. Syst. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
A Parametrizable High-Level Synthesis Library for Accelerating Neural Networks on FPGAs
Journal of Signal Processing Systems ( IF 1.6 ) Pub Date : 2021-03-15 , DOI: 10.1007/s11265-021-01651-5
Lester Kalms , Pedram Amini Rad , Muhammad Ali , Arsany Iskander , Diana Göhringer

In recent years, Convolutional Neural Network CNN have been incorporated in a large number of applications, including multimedia retrieval and image classification. However, CNN based algorithms are computationally and resource intensive and therefore difficult to be used in embedded systems. FPGA based accelerators are becoming more and more popular in research and industry due to their flexibility and energy efficiency. However, the available resources and the size of the on-chip memory can limit the performance of the FPGA accelerator for CNN. This work proposes an High-Level Synthesis HLS library for CNN algorithms. It contains seven different streaming-capable CNN (plus two conversion) functions for creating large neural networks with deep pipelines. The different functions have many parameter settings (e.g. for resolution, feature maps, data types, kernel size, parallelilization, accuracy, etc.), which also enable compile-time optimizations. Our functions are integrated into the HiFlipVX library, which is an open source HLS FPGA library for image processing and object detection. This offers the possibility to implement different types of computer vision applications with one library. Due to the various configuration and parallelization possibilities of the library functions, it is possible to implement a high-performance, scalable and resource-efficient system, as our evaluation of the MobileNets algorithm shows.



中文翻译:

可参数化的高级综合库,用于在FPGA上加速神经网络

近年来,卷积神经网络CNN已被纳入许多应用,包括多媒体检索和图像分类。但是,基于CNN的算法在计算上和资源上都很密集,因此很难在嵌入式系统中使用。基于FPGA的加速器由于其灵活性和能效而在研究和工业中变得越来越受欢迎。但是,可用资源和片上存储器的大小会限制CNN的FPGA加速器的性能。这项工作为CNN算法提出了一个高级综合HLS库。它包含七个不同的具有流功能的CNN(加上两个转换)功能,用于创建具有深层管道的大型神经网络。不同的功能具有许多参数设置(例如,分辨率,功能图,数据类型,内核大小,并行化,准确性等),还可以实现编译时优化。我们的功能已集成到HiFlipVX库中,该库是用于图像处理和对象检测的开源HLS FPGA库。这提供了使用一个库来实现不同类型的计算机视觉应用程序的可能性。由于库功能的各种配置和并行化可能性,有可能实现高性能,可伸缩和资源高效的系统,正如我们对MobileNets算法的评估所示。这提供了使用一个库来实现不同类型的计算机视觉应用程序的可能性。由于库功能的各种配置和并行化可能性,有可能实现高性能,可伸缩和资源高效的系统,正如我们对MobileNets算法的评估所示。这提供了使用一个库来实现不同类型的计算机视觉应用程序的可能性。由于库功能的各种配置和并行化可能性,有可能实现高性能,可伸缩和资源高效的系统,正如我们对MobileNets算法的评估所示。

更新日期:2021-03-15
down
wechat
bug