Advances in Engineering Software ( IF 4.8 ) Pub Date : 2021-07-06 , DOI: 10.1016/j.advengsoft.2020.102962 J. Bakosi 1 , R. Bird 1 , F. Gonzalez 2 , C. Junghans 1 , W. Li 3 , H. Luo 3 , A. Pandare 1 , J. Waltz 1
We discuss the implementation of a finite element method, used to numerically solve the Euler equations of compressible flows, using an asynchronous runtime system (RTS). The algorithm is implemented for distributed-memory machines, using stationary unstructured 3D meshes, combining data-, and task-parallelism on top of the Charm++ RTS. Charm++’s execution model is asynchronous by default, allowing arbitrary overlap of computation and communication. Task-parallelism allows scheduling parts of an algorithm independently of, or dependent on, each other. Built-in automatic load balancing enables continuous redistribution of computational load by migration of work units based on real-time CPU load measurement. The RTS also features automatic checkpointing, fault tolerance, resilience against hardware failure, and supports power-, and energy-aware computation. We demonstrate scalability up to cells at compute cores and the benefits of automatic load balancing for irregular workloads. The full source code with documentation is available at https://quinoacomputing.org.
中文翻译:
非结构化 3D 欧拉网格上可压缩流的异步分布式内存任务并行算法
我们讨论了有限元方法的实现,该方法用于使用异步运行时系统 (RTS) 对可压缩流的欧拉方程进行数值求解。该算法是为分布式内存机器实现的,使用固定的非结构化 3D 网格,在 Charm++ RTS 之上结合数据和任务并行。Charm++ 的执行模型默认是异步的,允许计算和通信的任意重叠。任务并行允许调度算法的各个部分独立或依赖于彼此。内置的自动负载平衡可以通过基于实时 CPU 负载测量的工作单元迁移来实现计算负载的连续重新分配。RTS 还具有自动检查点、容错、硬件故障恢复能力,并支持电源、和能量感知计算。我们展示了可扩展性高达 细胞在 计算核心以及针对不规则工作负载的自动负载平衡的好处。包含文档的完整源代码可在 https://quinoacomputing.org 上获得。