当前位置: X-MOL 学术J. Circuits Syst. Comput. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Design Space Exploration of Matrix–Matrix Convolution Operation
Journal of Circuits, Systems and Computers ( IF 1.5 ) Pub Date : 2021-06-18 , DOI: 10.1142/s0218126621502881
Piyalee Behera 1 , Arighna Deb 1
Affiliation  

Convolution is an important operation in neural networks which, in recent years, received significant attention from the researchers thanks to its ability to handle complex tasks such as image processing, computer vision in an efficient manner. In general, the convolution operation in neural networks considers two matrices as inputs: an image matrix representing an image and a kernel matrix required for necessary image processing operation and performs several multiplications and addition operations among the elements of image and kernel matrices. Realizing a circuit structure for matrix–matrix convolution is straightforward as each multiplication is realized by a multiplier, whereas an addition is carried out by an adder. However, the corresponding circuits result in large area, high power consumption and long delay because of the large number of multiplications and additions that are involved in the matrix–matrix convolution operations. While, the existing approaches focus on the accelerations of this computationally intensive tasks, they often do not guarantee minimality of area, power and delay. But we show that there exists design aspects through which the circuit structures for convolution operations can be realized with less area, power and delay. To do this, we consider the kernel definitions during the design of the circuit structures since the kernel matrices are often (pre)-determined based on the desired applications. Motivated by this, we first explore the design space of the convolution operation by introducing an alternative design scheme for realizing the respective operation between two matrices keeping the image processing/neural network applications in mind. Experimental evaluations confirm the potential benefits of the proposed design scheme and demonstrate that the reductions in the area and power by approximately 88% and critical path delay by approximately 31% can be achieved using the proposed design scheme. In addition, the FPGA implementations of the proposed scheme also show that the reductions of approximately 93% and 54% in the number of LUTs and in the number of pins, respectively, can be achieved. Compared to prior works, the proposed scheme allows higher parallelism with minimum LUT utilization.

中文翻译:

矩阵-矩阵卷积运算的设计空间探索

卷积是神经网络中的一项重要操作,近年来,由于它能够以有效的方式处理图像处理、计算机视觉等复杂任务,因此受到了研究人员的极大关注。一般来说,神经网络中的卷积运算考虑两个矩阵作为输入:一个表示图像的图像矩阵和一个必要的图像处理操作所需的核矩阵,并在图像和核矩阵的元素之间执行多次乘法和加法运算。实现矩阵-矩阵卷积的电路结构很简单,因为每次乘法都是由乘法器实现的,而加法是由加法器执行的。但是,相应的电路导致面积大,由于矩阵-矩阵卷积运算涉及大量乘法和加法,因此功耗高且延迟长。虽然现有方法专注于加速这种计算密集型任务,但它们通常不能保证面积、功率和延迟的最小化。但是我们表明存在设计方面,通过这些设计方面可以以更少的面积、功率和延迟来实现卷积运算的电路结构。为此,我们在电路结构的设计过程中考虑内核定义,因为内核矩阵通常是根据所需的应用程序(预先)确定的。以此为动力,我们首先通过引入一种替代设计方案来探索卷积运算的设计空间,以实现两个矩阵之间的相应运算,同时牢记图像处理/神经网络应用。实验评估证实了所提出的设计方案的潜在好处,并证明面积和功率减少了大约88%和关键路径延迟大约31%可以使用所提出的设计方案来实现。此外,所提出方案的 FPGA 实现还表明,约93%54%在 LUT 的数量和引脚的数量上,可以分别实现。与以前的工作相比,所提出的方案允许更高的并行性和最小的 LUT 利用率。
更新日期:2021-06-18
down
wechat
bug