Improving Power of DSP and CNN Hardware Accelerators Using Approximate Floating-point Multipliers,ACM Transactions on Embedded Computing Systems

当前位置： X-MOL 学术 › ACM Trans. Embed. Comput. Syst. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Improving Power of DSP and CNN Hardware Accelerators Using Approximate Floating-point Multipliers
ACM Transactions on Embedded Computing Systems ( IF 2.8 ) Pub Date : 2021-07-09 , DOI: 10.1145/3448980
Vasileios Leon ₁ , Theodora Paparouni ₁ , Evangelos Petrongonas ₁ , Dimitrios Soudris ₁ , Kiamal Pekmestzi ₁

Affiliation

Approximate computing has emerged as a promising design alternative for delivering power-efficient systems and circuits by exploiting the inherent error resiliency of numerous applications. The current article aims to tackle the increased hardware cost of floating-point multiplication units, which prohibits their usage in embedded computing. We introduce AFMU (Approximate Floating-point MUltiplier), an area/power-efficient family of multipliers, which apply two approximation techniques in the resource-hungry mantissa multiplication and can be seamlessly extended to support dynamic configuration of the approximation levels via gating signals. AFMU offers large accuracy configuration margins, provides negligible logic overhead for dynamic configuration, and detects unexpected results that may arise due to the approximations. Our evaluation shows that AFMU delivers energy gains in the range 3.6%–53.5% for half-precision and 37.2%–82.4% for single-precision, in exchange for mean relative error around 0.05%–3.33% and 0.01%–2.20%, respectively. In comparison with state-of-the-art multipliers, AFMU exhibits up to 4–6× smaller error on average while delivering more energy-efficient computing. The evaluation in image processing shows that AFMU provides sufficient quality of service, i.e., more than 50 db PSNR and near 1 SSIM values, and up to 57.4% power reduction. When used in floating-point CNNs, the accuracy loss is small (or zero), i.e., up to 5.4% for MNIST and CIFAR-10, in exchange for up to 63.8% power gain.

中文翻译：

使用近似浮点乘法器提高 DSP 和 CNN 硬件加速器的能力

通过利用众多应用程序固有的错误恢复能力，近似计算已成为一种很有前途的设计替代方案，用于提供高能效系统和电路。当前的文章旨在解决浮点乘法单元增加的硬件成本，这限制了它们在嵌入式计算中的使用。我们介绍了 AFMU（近似浮点乘法器），这是一个面积/功率高效的乘法器系列，它在资源匮乏的尾数乘法中应用了两种近似技术，并且可以无缝扩展以支持通过门控信号动态配置近似级别。AFMU 提供大的精度配置余量，为动态配置提供可忽略的逻辑开销，并检测可能由于近似值而出现的意外结果。我们的评估表明，AFMU 在半精度和单精度的范围内提供 3.6%–53.5% 和 37.2%–82.4% 的能量增益，以换取 0.05%–3.33% 和 0.01%–2.20% 左右的平均相对误差，分别。与最先进的乘法器相比，AFMU 的平均误差减少了 4-6 倍，同时提供了更节能的计算。图像处理方面的评估表明，AFMU 提供了足够的服务质量，即超过 50D bPSNR 和接近 1 的 SSIM 值，以及高达 57.4% 的功率降低。在浮点 CNN 中使用时，精度损失很小（或为零），即 MNIST 和 CIFAR-10 高达 5.4%，以换取高达 63.8% 的功率增益。

更新日期：2021-07-09

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南11