Automatic Halo Management for the Uintah GPU-Heterogeneous Asynchronous Many-Task Runtime,International Journal of Parallel Programming

当前位置： X-MOL 学术 › Int. J. Parallel. Program › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Automatic Halo Management for the Uintah GPU-Heterogeneous Asynchronous Many-Task Runtime
International Journal of Parallel Programming ( IF 0.9 ) Pub Date : 2018-12-07 , DOI: 10.1007/s10766-018-0619-1
Brad Peterson , Alan Humphrey , Dan Sunderland , James Sutherland , Tony Saad , Harish Dasari , Martin Berzins

The Uintah computational framework is used for the parallel solution of partial differential equations on adaptive mesh refinement grids using modern supercomputers. Uintah is structured with an application layer and a separate runtime system. Uintah is based on a distributed directed acyclic graph of computational tasks, with a task scheduler that efficiently schedules and executes these tasks on both CPU cores and on-node accelerators. The runtime system identifies task dependencies, creates a task graph prior to the execution of these tasks, automatically generates MPI message tags, and automatically performs halo transfers for simulation variables. Automating halo transfers in a heterogeneous environment poses significant challenges when tasks compute within a few milliseconds, as runtime overhead affects wall time execution, or when simulation variables require large halos spanning most or all of the computational domain, as task dependencies become expensive to process. These challenges are magnified at production scale when application developers require each compute node perform thousands of different halo transfers among thousands simulation variables. The principal contribution of this work is to (1) identify and address inefficiencies that arise when mapping tasks onto the GPU in the presence of automated halo transfers, (2) implement new schemes to reduce runtime system overhead, (3) minimize application developer involvement with the runtime, and (4) show overhead reduction results from these improvements.

中文翻译：

Uintah GPU-异构异步多任务运行时的自动光环管理

Uintah 计算框架用于使用现代超级计算机在自适应网格细化网格上并行求解偏微分方程。Uintah 由一个应用层和一个单独的运行时系统构成。Uintah 基于计算任务的分布式有向无环图，带有一个任务调度器，可以在 CPU 内核和节点加速器上高效地调度和执行这些任务。运行时系统识别任务依赖性，在执行这些任务之前创建任务图，自动生成 MPI 消息标签，并自动执行仿真变量的光环传输。当任务在几毫秒内计算时，在异构环境中自动化晕轮传输会带来重大挑战，因为运行时开销会影响挂钟时间执行，或者当模拟变量需要跨越大部分或全部计算域的大光环时，因为任务依赖性变得处理起来很昂贵。当应用程序开发人员要求每个计算节点在数千个模拟变量之间执行数千个不同的光环传输时，这些挑战在生产规模上被放大。这项工作的主要贡献是 (1) 识别并解决在存在自动光环传输的情况下将任务映射到 GPU 时出现的低效率，(2) 实施新方案以减少运行时系统开销，(3) 最大限度地减少应用程序开发人员的参与与运行时，以及 (4) 显示这些改进的开销减少结果。当应用程序开发人员要求每个计算节点在数千个模拟变量之间执行数千个不同的光环传输时，这些挑战在生产规模上被放大。这项工作的主要贡献是 (1) 识别并解决在存在自动光环传输的情况下将任务映射到 GPU 时出现的低效率，(2) 实施新方案以减少运行时系统开销，(3) 最大限度地减少应用程序开发人员的参与与运行时，以及 (4) 显示这些改进的开销减少结果。当应用程序开发人员要求每个计算节点在数千个模拟变量之间执行数千个不同的光环传输时，这些挑战在生产规模上被放大。这项工作的主要贡献是 (1) 识别并解决在存在自动光环传输的情况下将任务映射到 GPU 时出现的低效率，(2) 实施新方案以减少运行时系统开销，(3) 最大限度地减少应用程序开发人员的参与与运行时，以及 (4) 显示这些改进的开销减少结果。

更新日期：2018-12-07

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南11