Row-Streaming Dataflow Using a Chaining Buffer and Systolic Array+ Structure,IEEE Computer Architecture Letters

当前位置： X-MOL 学术 › IEEE Comput. Archit. Lett. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Row-Streaming Dataflow Using a Chaining Buffer and Systolic Array+ Structure
IEEE Computer Architecture Letters ( IF 1.4 ) Pub Date : 2021-01-26 , DOI: 10.1109/lca.2021.3054371
Hweesoo Kim , Sunjung Lee , Jaewan Choi , Jung Ho Ahn

Convolutional Neural Networks (CNNs) are widely used to solve complex problems in various fields, such as image recognition, image classification, and video analysis. Convolutional (CONV) layers are the most computational part of the CNN inference; various architectures have been proposed to process it efficiently. Among those, a systolic array consists of a 2D array of processing elements, which handle GEneral Matrix Multiplication (GEMM) with high efficiency. However, to process a CONV layer as a GEMM type, image-to-column (im2col) processing, which is also called lowering, is required per layer, necessitating a larger on-chip memory and a considerable amount of repetitive on-chip memory access. In this letter, we propose a systolic array+ (SysAr+) structure augmented with a chaining buffer and a row-streaming dataflow that can maximize data reuse without the im2col pre-process in the CONV layer and the repetitive access from the large on-chip memory. By applying the proposed method to the 3×3 CONV layers, we reduce the energy consumption by up to 19.7 percent in ResNet and 37.4 percent in DenseNet with an area overhead of 1.54 percent in SysAr+, and we improve the performance by up to 32.4 percent in ResNet and 12.1 percent in DenseNet.

中文翻译：

使用链缓冲和脉动Array +结构的行流数据流

卷积神经网络（CNN）被广泛用于解决各个领域的复杂问题，例如图像识别，图像分类和视频分析。卷积（CONV）层是CNN推理中计算量最大的部分；已经提出了各种架构来有效地对其进行处理。其中，脉动阵列由处理元件的二维阵列组成，可高效处理通用矩阵乘法（GEMM）。但是，要将CONV层处理为GEMM类型，每层都需要图像到列（im2col）处理（也称为降低），这需要更大的片上存储器和大量的重复片上存储器使用权。在这封信中我们提出了一种脉动阵列+（SysAr +）结构，该结构增加了链接缓冲区和行流数据流，该结构可以最大程度地提高数据重用性，而无需在CONV层中进行im2col预处理以及从大型片上存储器进行重复访问。通过将建议的方法应用于3×3 CONV层，我们在ResNet和DenseNet中将能源消耗分别降低了19.7％和37.4％，在SysAr +中将区域开销降低了1.54％，并且将性能提高了32.4％在ResNet和DenseNet中占12.1％。

更新日期：2021-02-12

点击分享查看原文

点击收藏

公开下载

阅读更多本刊最新论文本刊介绍/投稿指南11