Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Predictive Guardbanding: Program-driven Timing Margin Reduction for GPUs
IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems ( IF 2.9 ) Pub Date : 2021-01-01 , DOI: 10.1109/tcad.2020.2992684
Jingwen Leng , Alper Buyuktosunoglu , Ramon Bertran , Pradip Bose , Yazhou Zu , Vijay Janapa Reddi

The energy efficiency of GPU architectures has emerged as an essential aspect of computer system design. In this article, we explore the energy benefits of reducing the GPU chip’s voltage to the safe limit, i.e., $V_{\min }$ point, using predictive software techniques. We perform such a study on several commercial off-the-shelf GPU cards. We find that there exists about 20% voltage guardband on those GPUs spanning two architectural generations, which, if “eliminated” entirely, can result in up to 25% energy savings on one of the studied GPU cards. Our measurement results unveil a program dependent $V_{\min }$ behavior across the studied applications, and the exact improvement magnitude depends on the program’s available guardband. We make fundamental observations about the program-dependent $V_{\min }$ behavior. We experimentally determine that the voltage noise has a more substantial impact on $V_{\min }$ compared to the process and temperature variation, and the activities during the kernel execution cause large voltage droops. From these findings, we show how to use kernels’ microarchitectural performance counters to predict its $V_{\min }$ value accurately. The average and maximum prediction errors are 0.5% and 3%, respectively. The accurate $V_{\min }$ prediction opens up new possibilities of a cross-layer dynamic guardbanding scheme for GPUs, in which software predicts and manages the voltage guardband, while the functional correctness is ensured by a hardware safety net mechanism.

中文翻译:

Predictive Guardbanding:程序驱动的 GPU 时序裕度减少

GPU 架构的能效已成为计算机系统设计的一个重要方面。在本文中,我们探讨了将 GPU 芯片的电压降低到安全极限的能源优势,即, $V_{\min }$ 点,使用预测软件技术。我们对几种现成的商用 GPU 卡进行了这样的研究。我们发现,在跨越两代架构的 GPU 上存在大约 20% 的电压保护带,如果完全“消除”这些电压保护带,其中一张所研究的 GPU 卡最多可节省 25% 的能源。我们的测量结果揭示了程序依赖 $V_{\min }$ 所研究的应用程序的行为,确切的改进幅度取决于程序的可用保护带。我们对依赖于程序的程序进行基本观察 $V_{\min }$ 行为。我们通过实验确定电压噪声对 $V_{\min }$ 与过程和温度变化相比,内核执行期间的活动会导致较大的电压下降。从这些发现中,我们展示了如何使用内核的微架构性能计数器来预测其 $V_{\min }$ 值准确。平均和最大预测误差分别为 0.5% 和 3%。准确的 $V_{\min }$ 预测为 GPU 的跨层动态保护带方案开辟了新的可能性,其中软件预测和管理电压保护带,同时通过硬件安全网机制确保功能正确性。
更新日期:2021-01-01
down
wechat
bug