当前位置: X-MOL 学术J. Sign. Process. Syst. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
AxSA: On the Design of High-Performance and Power-Efficient Approximate Systolic Arrays for Matrix Multiplication
Journal of Signal Processing Systems ( IF 1.6 ) Pub Date : 2020-08-11 , DOI: 10.1007/s11265-020-01582-7
Haroon Waris , Chenghua Wang , Weiqiang Liu , Fabrizio Lombardi

Compute-bound problems like matrix-matrix multiplication can be accelerated using special purpose hardware scheme such as Systolic Arrays (SAs). However, processing elements in SAs have a long critical path delay, thus limiting the performance benefits of SAs. This paper presents a scheme to achieve high-performance matrix multiplication using SAs. Two approximate matrix multiplier designs (Ax1 and Ax2) of variable accuracy/power are proposed. The proposed designs (8-bit) achieve an improvement of 32% in terms of critical path delay and for scale-up variants (32-bit) the improvement in delay and energy scale upto 64% and 51%, respectively. Moreover, Ax1 and Ax2 have a reduced power-delay product compared to previous approximate matrix multiplier designs. This leads to an improved resolution of the prior accuracy-energy Pareto front; therefore, we define a new Pareto front for approximate matrix multipliers. As a case study, the discrete cosine transform is evaluated. Ax2 achieves the best quality-power trade-off and it exhibits a 5% degradation in structural similarity index (SSIM) with a power saving of 28%.



中文翻译:

AxSA:关于矩阵乘法的高性能,高能效近似脉动阵列的设计

诸如矩阵矩阵乘法之类的计算绑定问题可以使用专用硬件方案(例如,脉动阵列(SA))来加速。但是,SA中的处理元素具有较长的关键路径延迟,因此限制了SA的性能优势。本文提出了一种使用SA实现高性能矩阵乘法的方案。提出了两种精度/功率可变的近似矩阵乘法器设计(Ax1和Ax2)。拟议的设计(8位)在关键路径延迟方面实现了32%的改进,而对于按比例放大的变体(32位),延迟和能耗的改进分别达到了64%和51%。此外,与以前的近似矩阵乘法器设计相比,Ax1和Ax2的功耗延迟乘积降低了。这导致了先验精度能量帕累托前沿的分辨率提高。因此,我们为近似矩阵乘法器定义了一个新的Pareto前沿。作为案例研究,对离散余弦变换进行了评估。Ax2实现了最佳的质量与功率的权衡,并且结构相似性指数(SSIM)降低了5%,节能28%。

更新日期:2020-08-11
down
wechat
bug