当前位置: X-MOL 学术ACM Trans. Archit. Code Optim. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Zeroploit
ACM Transactions on Architecture and Code Optimization ( IF 1.6 ) Pub Date : 2020-07-07 , DOI: 10.1145/3394284
Ram Rangan 1 , Mark W. Stephenson 2 , Aditya Ukarande 1 , Shyam Murthy 3 , Virat Agarwal 4 , Marc Blackstein 5
Affiliation  

In this article, we first characterize register operand value locality in shader programs of modern gaming applications and observe that there is a high likelihood of one of the register operands of several multiply, logical-and, and similar operations being zero, dynamically. We provide intuition, examples, and a quantitative characterization for how zeros originate dynamically in these programs. Next, we show that this dynamic behavior can be gainfully exploited with a profile-guided code optimization called Zeroploit that transforms targeted code regions into a zero-(value-)specialized fast path and a default slow path. The fast path benefits from zero-specialization in two ways, namely: (a) the backward slice of the other operand of a given multiply or logical-and can be skipped dynamically, provided the only use of that other operand is in the given instruction, and (b) the forward slice of instructions originating at the given instruction can be zero-specialized, potentially triggering further backward slice specializations from operations of that forward slice as well. Such specialization helps the fast path avoid redundant dynamic computations as well as memory fetches, while the fast-slow versioning transform helps preserve functional correctness. With an offline value profiler and manually optimized shader programs, we demonstrate that Zeroploit is able to achieve an average speedup of 35.8% for targeted shader programs, amounting to an average frame-rate speedup of 2.8% across a collection of modern gaming applications on an NVIDIA® GeForce RTX™ 2080 GPU.

中文翻译:

Zeroploit

在本文中,我们首先描述了现代游戏应用程序的着色器程序中的寄存器操作数值局部性,并观察到多个乘法、逻辑与和类似操作的寄存器操作数之一动态为零的可能性很高。我们提供直觉、示例和定量表征,说明零是如何在这些程序中动态产生的。接下来,我们展示了这种动态行为可以通过配置文件引导的代码优化来有效地利用,称为Zeroploit它将目标代码区域转换为零(值)专用快速路径和默认慢速路径。快速路径从零专业化中受益有两个方面,即:(a)其他给定乘法或逻辑的操作数可以动态跳过,前提是该其他操作数的唯一用途是在给定指令中,并且 (b) 源自给定指令的前向指令片可以是零专用的,可能会触发从该前向切片的操作中进一步向后切片特化。这种专门化有助于快速路径避免冗余动态计算以及内存提取,而快速-慢速版本转换有助于保持功能正确性。通过离线值分析器和手动优化的着色器程序,我们证明了Zeroploit能够为目标着色器程序实现 35.8% 的平均加速,相当于 NVIDIA® GeForce RTX™ 2080 GPU 上一系列现代游戏应用程序的平均帧速率加速 2.8%。
更新日期:2020-07-07
down
wechat
bug