当前位置: X-MOL 学术Comput. Phys. Commun. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
On the performance of a highly-scalable Computational Fluid Dynamics code on AMD, ARM and Intel processor-based HPC systems
Computer Physics Communications ( IF 7.2 ) Pub Date : 2021-07-22 , DOI: 10.1016/j.cpc.2021.108105
Pablo Ouro 1, 2 , Unai Lopez-Novoa 3 , Martyn F. Guest 4
Affiliation  

No area of computing is hungrier for performance than High Performance Computing (HPC), the demands of which continue to be a major driver for processor performance and adoption of accelerators, and also advances in memory, storage, and networking technologies. A key feature of the Intel processor domination of the past decade has been the extensive adoption of GPUs as coprocessors, whilst more recent developments have seen the increased availability of a number of CPU processors, including the novel ARM-based chips. This paper analyses the performance and scalability of a state-of-the-art Computational Fluid Dynamics (CFD) code on two HPC cluster systems: Hawk, equipped with AMD EPYC-Rome (EPYC, 4096 cores) and Intel Skylake (SKL, 8000 cores) processors and Infiniband EDR interconnect; and Isambard, equipped with ARM-based Marvell ThunderX2 (TX2, 8192 cores) and a Cray Aries interconnect. The code Hydro3D was analysed in three benchmark cases with increasing level of numerical complexity, namely lid-driven cavity flow using 4th-order central-differences, Taylor-Green vortex solved with a 5th-order WENO scheme, and a travelling solitary wave computed using the level-set method and WENO; in problem sizes designed with larger computation-to-communication ratio on a single or multiple nodes. Our results show that the EPYC cluster delivers the best code performance for all the setups under consideration. In the first two benchmarks, the SKL cluster demonstrates faster computing times than the TX2 system, whilst in the solitary wave simulations, the TX2 cluster achieves good scalability and similar performance to the EPYC system, both improving on that obtained with the SKL cluster. These results suggest that while the Intel SKL cores deliver the best strong scalability, the associated cluster performance is lower compared to the EPYC system. The TX2 cluster performance is promising considering its recent addition to the HPC portfolio.



中文翻译:

关于高度可扩展的计算流体动力学代码在基于 AMD、ARM 和 Intel 处理器的 HPC 系统上的性能

没有哪个计算领域比高性能计算 (HPC) 更需要性能,它的需求继续成为处理器性能和加速器采用以及内存、存储和网络技术进步的主要驱动力。过去十年英特尔处理器主导地位的一个关键特征是广泛采用 GPU 作为协处理器,而最近的发展已经看到许多 CPU 处理器的可用性增加,包括基于 ARM 的新型芯片。本文分析了最先进的计算流体动力学 (CFD) 代码在两个 HPC 集群系统上的性能和可扩展性:配备 AMD EPYC-Rome(EPYC,4096 核)和 Intel Skylake(SKL,8000核)处理器和 Infiniband EDR 互连;和 Isambard,配备基于 ARM 的 Marvell ThunderX2(TX2,8192 核)和 Cray Aries 互连。代码 Hydro3D 在三个基准案例中进行了分析,数值复杂性不断增加,即盖子驱动的腔流使用4H阶中心差分,Taylor-Green 涡旋用 a 求解 5H-阶 WENO 方案,以及使用水平集方法和 WENO 计算的行孤立波;在单个或多个节点上设计具有更大计算与通信比的问题规模。我们的结果表明,EPYC 集群为所有考虑的设置提供了最佳的代码性能。在前两个基准测试中,SKL 集群表现出比 TX2 系统更快的计算时间,而在孤波模拟中,TX2 集群实现了良好的可扩展性和与 EPYC 系统相似的性能,两者都改进了使用 SKL 集群获得的性能。这些结果表明,虽然英特尔 SKL 内核提供了最佳的强大可扩展性,但与 EPYC 系统相比,相关的集群性能较低。考虑到最近加入 HPC 产品组合,TX2 集群的性能很有希望。

更新日期:2021-08-01
down
wechat
bug