当前位置: X-MOL 学术IEEE Trans. Neural Netw. Learn. Syst. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
ECBC: Efficient Convolution via Blocked Columnizing
IEEE Transactions on Neural Networks and Learning Systems ( IF 10.2 ) Pub Date : 2021-08-10 , DOI: 10.1109/tnnls.2021.3095276
Tianli Zhao 1 , Qinghao Hu 2 , Xiangyu He 1 , Weixiang Xu 2 , Jiaxing Wang 2 , Cong Leng 2 , Jian Cheng 1
Affiliation  

Direct convolution methods are now drawing increasing attention as they eliminate the additional storage demand required by indirect convolution algorithms (i.e., the transformed matrix generated by the im2col convolution algorithm). Nevertheless, the direct methods require special input–output tensor formatting, leading to extra time and memory consumption to get the desired data layout. In this article, we show that indirect convolution, if implemented properly, is able to achieve high computation performance with the help of highly optimized subroutines in matrix multiplication while avoid incurring substantial memory overhead. The proposed algorithm is called efficient convolution via blocked columnizing (ECBC). Inspired by the im2col convolution algorithm and the block algorithm of general matrix-to-matrix multiplication, we propose to conduct the convolution computation blockwisely. As a result, the tensor-to-matrix transformation process (e.g., the im2col operation) can also be done in a blockwise manner so that it only requires a small block of memory as small as the data block. Extensive experiments on various platforms and networks validate the effectiveness of ECBC, as well as the superiority of our proposed method against a set of widely used industrial-level convolution algorithms.

中文翻译:


ECBC:通过分块列化实现高效卷积



直接卷积方法现在越来越受到关注,因为它们消除了间接卷积算法(即由 im2col 卷积算法生成的变换矩阵)所需的额外存储需求。然而,直接方法需要特殊的输入输出张量格式,导致需要额外的时间和内存消耗才能获得所需的数据布局。在本文中,我们表明,如果实现得当,间接卷积能够借助矩阵乘法中高度优化的子例程实现高计算性能,同时避免产生大量内存开销。所提出的算法称为通过分块列化的高效卷积(ECBC)。受到im2col卷积算法和一般矩阵到矩阵乘法的分块算法的启发,我们提出分块进行卷积计算。因此,张量到矩阵的变换过程(例如,im2col操作)也可以以块方式完成,这样它只需要与数据块一样小的小块存储器。在各种平台和网络上进行的大量实验验证了 ECBC 的有效性,以及我们提出的方法相对于一组广泛使用的工业级卷积算法的优越性。
更新日期:2021-08-10
down
wechat
bug