当前位置: X-MOL 学术Concurr. Comput. Pract. Exp. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Dynamic workload prediction and distribution in numerical modeling of solidification on multi‐/manycore architectures
Concurrency and Computation: Practice and Experience ( IF 2 ) Pub Date : 2020-07-04 , DOI: 10.1002/cpe.5905
Kamil Halbiniak 1 , Tomasz Olas 1 , Lukasz Szustak 1 , Adam Kulawik 1 , Marco Lapegna 2
Affiliation  

This work is a part of the global tendency to use modern computing systems for modeling the phase‐field phenomena. The main goal of this article is to improve the performance of a parallel application for the solidification modeling, assuming the dynamic intensity of computations in successive time steps when calculations are performed using a carefully selected group of nodes in the grid. A two‐step method is proposed to optimize the application for multi‐/manycore architectures. In the first step, the loop fusion is used to execute all kernels in a single nested loop and reduce the number of conditional operators. These modifications are vital to implementing the second step, which includes an algorithm for the dynamic workload prediction and load balancing across cores of a computing platform. Two versions of the algorithm are proposed—with the 1D and 2D maps used for predicting the computational domain within the grid. The proposed optimizations allow increasing the application performance significantly for all tested configurations of computing resources. The highest performance gain is achieved for two Intel Xeon Platinum 8180 CPUs, where the new code based on the 2D map yields the speedup of up to 2.74 times, while the usage of the proposed method with the 2D map for a single KNL accelerator permits reducing the execution time up to 1.91 times.

中文翻译:

多/多核架构的凝固过程数值模型中的动态工作量预测和分布

这项工作是使用现代计算系统对相场现象进行建模的全球趋势的一部分。本文的主要目的是提高在凝固建模中并行应用程序的性能,假设使用网格中精心选择的一组节点执行计算时,在连续的时间步长中进行动态计算。提出了一种分两步的方法来优化多核/多核体系结构的应用程序。第一步,循环融合用于在单个嵌套循环中执行所有内核,并减少条件运算符的数量。这些修改对于实施第二步至关重要,第二步包括用于跨计算平台核心进行动态工作负载预测和负载平衡的算法。提出了该算法的两个版本-一维和二维映射用于预测网格内的计算域。所提出的优化允许对所有经过测试的计算资源配置显着提高应用程序性能。两个Intel Xeon Platinum 8180 CPU可获得最高的性能提升,其中基于2D映射的新代码可将速度提高多达2.74倍,而将建议的方法与2D映射一起用于单个KNL加速器可减少执行时间高达1.91倍。
更新日期:2020-07-04
down
wechat
bug