当前位置: X-MOL 学术J. Real-Time Image Proc. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
A streaming architecture for Convolutional Neural Networks based on layer operations chaining
Journal of Real-Time Image Processing ( IF 3 ) Pub Date : 2020-01-04 , DOI: 10.1007/s11554-019-00938-y
Moisés Arredondo-Velázquez , Javier Diaz-Carmona , Cesar Torres-Huitzil , Alfredo Padilla-Medina , Juan Prado-Olivarez

Convolutional Neural Networks (CNN) have become one of the best algorithms in machine learning for content classification of digital images. The CNN computational complexity is much larger than traditional algorithms, that is why the use of Graphical Processor Units (GPU) and online servers to achieve operations acceleration is a common solution. However, there is a growing demand for real-time processing solutions in the object recognition field mainly implemented on embedded systems, which are limited both in resources and energy consumption. Recently, reported works are focused on minimizing the required resources through two design strategies. The first one is by implementing one accelerator that can be adapted to the operations of the whole CNN. The CNN architecture proposals with one accelerator for each convolution layer belong to the second design strategy, where higher performance is achieved in multiple image processing. A new design strategy is proposed in this paper, which is based on multiple accelerators using a layer operation chaining scheme for computing in parallel the operations corresponding to multiple CNN layers. Three types of parallel data processing are adopted in the proposed architecture, where the parallelism level for convolution layers is determined by defined cost-function-based algorithms. The proposed design strategy is shown by implementing three naive CNNs on a De2i-150 board, in which a peak acceleration of 18.04x was achieved in contrast with state-of-the-art design methods without layer operation chaining. Furthermore, the design results of one modified Alexnet CNN were obtained. According to the obtained results, the proposed design strategy allows to achieve a smaller processing time than that obtained by reported works using the other two design strategies. In addition, a competitive result in resources utilization is obtained for naive CNNs.

中文翻译:

基于层操作链的卷积神经网络流架构

卷积神经网络(CNN)已成为机器学习中用于数字图像内容分类的最佳算法之一。CNN的计算复杂度比传统算法大得多,这就是为什么使用图形处理器单元(GPU)和在线服务器来实现操作加速是常见的解决方案的原因。然而,在主要在嵌入式系统上实现的对象识别领域中,对实时处理解决方案的需求不断增长,这在资源和能耗上都受到限制。最近,报道的工作集中在通过两种设计策略来最大限度地减少所需资源。第一个是通过实现一个可以适应整个CNN操作的加速器。对于每个卷积层都具有一个加速器的CNN体​​系结构提案属于第二种设计策略,其中在多个图像处理中实现了更高的性能。本文提出了一种新的设计策略,该策略基于使用层操作链接方案的多个加速器来并行计算与多个CNN层相对应的操作。所提出的体系结构采用了三种类型的并行数据处理,其中卷积层的并行度由定义的基于成本函数的算法确定。通过在De2i-150板上实现三个朴素的CNN来展示所提出的设计策略,与没有层操作链的最新设计方法相比,在其中实现了18.04x的峰值加速度。此外,获得了一种改进的Alexnet CNN的设计结果。根据获得的结果,与使用其他两种设计策略通过报告的作品获得的处理时间相比,提出的设计策略可以实现更短的处理时间。此外,对于原始的CNN,在资源利用方面也有竞争优势。
更新日期:2020-01-04
down
wechat
bug