当前位置: X-MOL 学术IEEE Trans. Comput. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Idempotence-Based Preemptive GPU Kernel Scheduling for Embedded Systems
IEEE Transactions on Computers ( IF 3.6 ) Pub Date : 2021-03-01 , DOI: 10.1109/tc.2020.2988251
Hyeonsu Lee 1 , Hyunjun Kim 1 , Cheolgi Kim 2 , Hwansoo Han 1 , Euiseong Seo 1
Affiliation  

Mission-critical embedded systems simultaneously run multiple GPU-computing tasks with different criticality and timeliness requirements. Considerable research effort has been dedicated to support the preemptive priority scheduling of GPU kernels. However, hardware-supported preemption leads to lengthy scheduling delays and complicated designs, and most software approaches depend on the voluntary yielding of GPU resources from restructured kernels. We propose a preemptive GPU-kernel scheduling scheme that harness the idempotence property of kernels. The proposed scheme distinguishes idempotent kernels through a static source analysis. If a kernel is not idempotent, then GPU kernels are transactionized at the operating system level. Both idempotent and transactionized kernels can be aborted at any point during their execution and rolled back to their initial state for reexecution. Therefore, the low-priority kernel instances can be preempted for the high-priority kernel instances and reexecuted after the GPU becomes available again. Our evaluation using the Rodinia benchmark suite showed that the proposed approach limits the preemption delay to 18 μs in the 99.9th percentile, with an average delay in execution time of less than 10 % for high-priority tasks under a heavy load in most cases.

中文翻译:

嵌入式系统的基于幂等的抢占式 GPU 内核调度

任务关键型嵌入式系统同时运行多个具有不同关键性和及时性要求的 GPU 计算任务。大量的研究工作致力于支持 GPU 内核的抢占优先级调度。然而,硬件支持的抢占导致冗长的调度延迟和复杂的设计,并且大多数软件方法依赖于从重构内核中自愿放弃 GPU 资源。我们提出了一种利用内核幂等性的抢占式 GPU 内核调度方案。所提出的方案通过静态源分析来区分幂等内核。如果内核不是幂等的,则 GPU 内核在操作系统级别进行事务处理。幂等和事务化内核都可以在执行期间的任何时候中止并回滚到其初始状态以重新执行。因此,低优先级内核实例可以被高优先级内核实例抢占,并在 GPU 再次可用后重新执行。我们使用 Rodinia 基准套件的评估表明,所提出的方法将第 99.9 个百分位数的抢占延迟限制为 18 μs,在大多数情况下,对于重负载下的高优先级任务,执行时间的平均延迟小于 10%。
更新日期:2021-03-01
down
wechat
bug