当前位置: X-MOL 学术Future Gener. Comput. Syst. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Energy and performance improvements in stencil computations on multi-node HPC systems with different network and communication topologies
Future Generation Computer Systems ( IF 7.5 ) Pub Date : 2020-08-21 , DOI: 10.1016/j.future.2020.08.018
Miłosz Ciżnicki , Krzysztof Kurowski , Jan Wȩglarz

Energy and performance improvements in stencil computations are relevant for both application developers and data center administrators. They appear as the fundamental scheme in many large-scale scientific simulations and workloads. Many research efforts have focused on some estimation techniques of the energy usage of HPC systems based on specific characteristics of parallel applications. In case of stencils, we have previously concentrated on detailed estimations of energy consumption and the energy-aware distribution of stencil computations on heterogeneous processors. However, we have restricted our comprehensive studies to a single heterogeneous computing node only. In this paper, we show how scheduling and optimization techniques can be applied for energy and performance improvements of stencil computations on multi-node HPC systems using different network topologies. We formulate a scheduling model together with a new Tabu Search algorithm, called Task Movement (TM), taking into account the communication hierarchies, to minimize the overall energy usage and the execution time of stencil computations. Experimental studies show that this algorithm solves the considered problem more efficiently comparing to other, simpler heuristics. We present computational experiments for a reference 7 point stencil computation pattern on three commonly used low-diameter network topologies: Fat-tree, Dragonfly, and Torus. According to our studies, the most promising multi-node HPC architecture for stencil computations is based on the Torus network concept. Finally, we argue that the proposed scheduling model and TM algorithm can be easily adopted within existing high-level parallel execution environments for stencils automatic performance tuning.

中文翻译:

具有不同网络和通信拓扑的多节点 HPC 系统上模板计算的能量和性能改进

模板计算中的能源和性能改进与应用程序开发人员和数据中心管理员都相关。它们是许多大规模科学模拟和工作负载中的基本方案。许多研究工作都集中在基于并行应用的特定特征的 HPC 系统能源使用的一些估计技术上。就模板而言,我们之前专注于能耗的详细估计以及异构处理器上模板计算的能量感知分布。然而,我们的综合研究仅限于单个异构计算节点。在本文中,我们展示了如何应用调度和优化技术来改进使用不同网络拓扑的多节点 HPC 系统上的模板计算的能量和性能。我们制定了一个调度模型以及一种新的禁忌搜索算法,称为任务移动(TM),考虑到通信层次结构,以最大限度地减少总体能源使用和模板计算的执行时间。实验研究表明,与其他更简单的启发式算法相比,该算法可以更有效地解决所考虑的问题。我们在三种常用的低直径网络拓扑上展示了参考 7 点模板计算模式的计算实验:胖树、蜻蜓和环面。根据我们的研究,最有前途的模板计算多节点 HPC 架构基于 Torus 网络概念。最后,我们认为所提出的调度模型和 TM 算法可以很容易地在现有的高级并行执行环境中采用,以进行模板自动性能调整。
更新日期:2020-08-21
down
wechat
bug