A hierarchical parallel implementation for heterogeneous computing. Application to algebra-based CFD simulations on hybrid supercomputers,Computers & Fluids

当前位置： X-MOL 学术 › Comput. Fluids › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

A hierarchical parallel implementation for heterogeneous computing. Application to algebra-based CFD simulations on hybrid supercomputers
Computers & Fluids ( IF 2.5 ) Pub Date : 2021-01-01 , DOI: 10.1016/j.compfluid.2020.104768
Xavier Álvarez-Farré , Andrey Gorobets , F. Xavier Trias

Abstract The quest for new portable implementations of simulation algorithms is motivated by the increasing variety of computing architectures. Moreover, the hybridization of high-performance computing systems imposes additional constraints, since heterogeneous computations are needed to efficiently engage processors and massively-parallel accelerators. This, in turn, involves different parallel paradigms and computing frameworks and requires complex data exchanges between computing units. Typically, simulation codes rely on sophisticated data structures and computing subroutines, so-called kernels, which makes portability terribly cumbersome. Thus, a natural way to achieve portability is to dramatically reduce the complexity of both data structures and computing kernels. In our algebra-based approach, the scale-resolving simulation of incompressible turbulent flows on unstructured meshes relies on three fundamental kernels: the sparse matrix-vector product, the linear combination of vectors and the dot product. It is noteworthy that this approach is not limited to a particular kind of numerical method or a set of governing equations. In our code, an auto-balanced multilevel partitioning distributes workload among computing devices of various architectures. The overlap of computations and multistage communications efficiently hides the data exchanges overhead in large-scale supercomputer simulations. In addition to computing on accelerators, special attention is paid at efficiency on manycore processors in multiprocessor nodes with significant non-uniform memory access factor. Parallel efficiency and performance are studied in detail for different execution modes on various supercomputers using up to 9,600 processor cores and up to 256 graphics processor units. The heterogeneous implementation model described in this work is a general-purpose approach that is well suited for various subroutines in numerical simulation codes.

中文翻译：

异构计算的分层并行实现。在混合超级计算机上基于代数的 CFD 仿真的应用

摘要越来越多的计算架构推动了对仿真算法的新的可移植实现的追求。此外，高性能计算系统的混合带来了额外的限制，因为需要异构计算来有效地使用处理器和大规模并行加速器。这反过来又涉及不同的并行范式和计算框架，并需要计算单元之间进行复杂的数据交换。通常，模拟代码依赖于复杂的数据结构和计算子程序，即所谓的内核，这使得可移植性非常麻烦。因此，实现可移植性的一种自然方式是显着降低数据结构和计算内核的复杂性。在我们基于代数的方法中，非结构化网格上不可压缩湍流的尺度解析模拟依赖于三个基本内核：稀疏矩阵向量积、向量的线性组合和点积。值得注意的是，这种方法并不限于一种特定的数值方法或一组控制方程。在我们的代码中，自动平衡的多级分区在各种架构的计算设备之间分配工作负载。计算和多级通信的重叠有效地隐藏了大规模超级计算机模拟中的数据交换开销。除了加速器上的计算外，还特别关注具有显着非均匀内存访问因子的多处理器节点中的众核处理器的效率。详细研究了使用多达 9,600 个处理器内核和多达 256 个图形处理器单元的各种超级计算机上不同执行模式的并行效率和性能。这项工作中描述的异构实现模型是一种通用方法，非常适合数值模拟代码中的各种子程序。

更新日期：2021-01-01

点击分享查看原文

点击收藏

公开下载

阅读更多本刊最新论文本刊介绍/投稿指南11