Reducing Energy in GPGPUs through Approximate Trivial Bypassing,ACM Transactions on Embedded Computing Systems

当前位置： X-MOL 学术 › ACM Trans. Embed. Comput. Syst. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Reducing Energy in GPGPUs through Approximate Trivial Bypassing
ACM Transactions on Embedded Computing Systems ( IF 2 ) Pub Date : 2021-01-04 , DOI: 10.1145/3429440
Ehsan Atoofian ₁ , Zayan Shaikh ₂ , Ali Jannesari ₂

Affiliation

General-purpose computing using graphics processing units (GPGPUs) is an attractive option for acceleration of applications with massively data-parallel tasks. While performance of modern GPGPUs is increasing rapidly, the power consumption of these devices is becoming a major concern. In particular, execution units and register file are among the top three most power-hungry components in GPGPUs. In this work, we exploit trivial instructions to reduce power consumption in GPGPUs. Trivial instructions are those instructions that do not need computations, i.e., multiplication by one. We found that, during the course of a program's execution, a GPGPU executes many trivial instructions. Execution of these instructions wastes power unnecessarily. In this work, we propose trivial bypassing which skips execution of trivial instructions and avoids unnecessary allocation of resources for trivial instructions. By power gating execution units and skipping trivial computing, trivial bypassing reduces both static and dynamic power. Also, trivial bypassing reduces dynamic energy of register file by avoiding access to register file for source and/or destination operands of trivial instructions. While trivial bypassing reduces energy of GPGPUs, it has detrimental impact on performance as a power-gated execution unit requires several cycles to resume its normal operation. Conventional warp schedulers are oblivious to the status of execution units. We propose a new warp scheduler that prioritizes warps based on availability of execution units. We also propose a set of new power management techniques to reduce performance penalty of power gating, further. To increase energy saving of trivial bypassing, we also propose approximating operands of instructions. We offer a set of new techniques to approximate both integer and floating-point instructions and increase the pool of trivial instructions. Our evaluations using a diverse set of benchmarks reveal that our proposed techniques are able to reduce energy of execution units by 11.2% and dynamic energy of register file by 12.2% with minimal performance and quality degradation.

中文翻译：

通过近似简单的旁路降低 GPGPU 中的能量

使用图形处理单元 (GPGPU) 的通用计算是加速具有大量数据并行任务的应用程序的一个有吸引力的选择。尽管现代 GPGPU 的性能正在迅速提高，但这些设备的功耗正成为一个主要问题。特别是，执行单元和寄存器文件是 GPGPU 中最耗电的三大组件之一。在这项工作中，我们利用简单的指令来降低 GPGPU 的功耗。普通指令是那些不需要计算的指令，即乘以一。我们发现，在程序执行过程中，GPGPU 会执行许多琐碎的指令。执行这些指令会不必要地浪费电力。在这项工作中，我们提出了琐碎的绕过，它跳过琐碎指令的执行并避免为琐碎指令分配不必要的资源。通过对执行单元进行电源门控并跳过琐碎的计算，琐碎的旁路可降低静态和动态功耗。此外，普通绕过通过避免访问普通指令的源和/或目标操作数的寄存器文件来减少寄存器文件的动态能量。虽然微不足道的绕过会降低 GPGPU 的能量，但它会对性能产生不利影响，因为电源门控执行单元需要几个周期才能恢复其正常运行。传统的 warp 调度器不知道执行单元的状态。我们提出了一种新的 warp 调度程序，它根据执行单元的可用性对 warp 进行优先级排序。我们还提出了一套新的电源管理技术，以进一步降低电源门控的性能损失。为了增加琐碎绕过的节能，我们还建议近似指令的操作数。我们提供了一组新技术来逼近整数和浮点指令并增加普通指令池。我们使用一组不同的基准进行的评估表明，我们提出的技术能够将执行单元的能量降低 11.2%，将寄存器文件的动态能量降低 12.2%，而性能和质量下降最小。我们提供了一组新技术来逼近整数和浮点指令并增加普通指令池。我们使用一组不同的基准进行的评估表明，我们提出的技术能够将执行单元的能量降低 11.2%，将寄存器文件的动态能量降低 12.2%，而性能和质量下降最小。我们提供了一组新技术来逼近整数和浮点指令并增加普通指令池。我们使用一组不同的基准进行的评估表明，我们提出的技术能够将执行单元的能量降低 11.2%，将寄存器文件的动态能量降低 12.2%，而性能和质量下降最小。

更新日期：2021-01-04

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南

全部期刊列表>>