当前位置: X-MOL 学术Comput. Sci. Eng. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Performance Measurements Within Asynchronous Task-Based Runtime Systems: A Double White Dwarf Merger as an Application
Computing in Science & Engineering ( IF 2.1 ) Pub Date : 2021-04-15 , DOI: 10.1109/mcse.2021.3073626
Patrick Diehl 1 , Dominic Marcello 1 , Parsa Amini 1 , Hartmut Kaiser 1 , Sagiv Shiber 1 , Geoffrey C. Clayton 1 , Juhan Frank 1 , Gregor Dais 2 , Dirk Pfluger 2 , David Eder 3 , Alice Koniges 3 , Kevin Huck 4
Affiliation  

Analyzing performance within asynchronous many-task-based runtime systems is challenging because millions of tasks are launched concurrently. Especially for long-term runs, the amount of data collected becomes overwhelming. We study HPX and its performance-counter framework and autonomic performance environment for Exascale to collect performance data and energy consumption. We added HPX application-specific performance counters to the Octo-Tiger full 3-D adaptive multigrid code astrophysics application. This enables the combined visualization of physical and performance data to highlight bottlenecks with respect to different solvers. We examine the overhead introduced by these measurements, which is around 1%, with respect to the overall application runtime. We perform a resolution study for four different levels of refinement and analyze the application's performance with respect to adaptive grid refinement. The measurements’ overheads are small, enabling the combined use of performance data and physical properties with the goal of improving the code's performance. All runs were obtained on NERSC's Cori, Louisiana Optical Network Infrastructure's QueenBee2, and Indiana University's Big Red 3.

中文翻译:

基于异步任务的运行时系统中的性能测量:作为应用程序的双白矮星合并

在基于多任务的异步运行时系统中分析性能具有挑战性,因为数百万个任务同时启动。尤其是对于长期运行而言,收集的数据量变得不堪重负。我们研究 HPX 及其性能计数器框架和 Exascale 的自主性能环境,以收集性能数据和能耗。我们向 Octo-Tiger 全 3-D 自适应多重网格代码天体物理学应用程序添加了 HPX 应用程序特定的性能计数器。这使得物理和性能数据的组合可视化能够突出不同求解器的瓶颈。我们检查了这些测量引入的开销,相对于整个应用程序运行时,约为 1%。我们对四种不同级别的细化进行了分辨率研究,并分析了应用程序在自适应网格细化方面的性能。测量的开销很小,可以结合使用性能数据和物理属性,以提高代码的性能。所有运行都是在 NERSC 的 Cori、路易斯安那州光网络基础设施的 QueenBee2 和印第安纳大学的 Big Red 3 上获得的。
更新日期:2021-06-18
down
wechat
bug