Automatic Code Generation for High-performance Discontinuous Galerkin Methods on Modern Architectures,ACM Transactions on Mathematical Software

当前位置： X-MOL 学术 › ACM Trans. Math. Softw. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Automatic Code Generation for High-performance Discontinuous Galerkin Methods on Modern Architectures
ACM Transactions on Mathematical Software ( IF 2.7 ) Pub Date : 2020-12-09 , DOI: 10.1145/3424144
Dominic Kempf ₁ , René Heß ₁ , Steffen Müthing ₁ , Peter Bastian ₁

Affiliation

SIMD vectorization has lately become a key challenge in high-performance computing. However, hand-written explicitly vectorized code often poses a threat to the software’s sustainability. In this publication, we solve this sustainability and performance portability issue by enriching the simulation framework dune-pdelab with a code generation approach. The approach is based on the well-known domain-specific language UFL but combines it with loopy, a more powerful intermediate representation for the computational kernel. Given this flexible tool, we present and implement a new class of vectorization strategies for the assembly of Discontinuous Galerkin methods on hexahedral meshes exploiting the finite element’s tensor product structure. The performance-optimal variant from this class is chosen by the code generator through an auto-tuning approach. The implementation is done within the open source PDE software framework Dune and the discretization module dune-pdelab. The strength of the proposed approach is illustrated with performance measurements for DG schemes for a scalar diffusion reaction equation and the Stokes equation. In our measurements, we utilize both the AVX2 and the AVX512 instruction set, achieving 30% to 40% of the machine’s theoretical peak performance for one matrix-free application of the operator.

中文翻译：

现代架构上高性能不连续 Galerkin 方法的自动代码生成

SIMD 矢量化最近已成为高性能计算的关键挑战。但是，手写的显式矢量化代码通常会对软件的可持续性构成威胁。在本出版物中，我们通过使用代码生成方法丰富模拟框架 dune-pdelab 来解决这个可持续性和性能可移植性问题。该方法基于著名的领域特定语言 UFL，但将其与 loopy 相结合，loopy 是计算内核的更强大的中间表示。鉴于这种灵活的工具，我们提出并实施了一类新的矢量化策略，用于利用有限元的张量积结构在六面体网格上组装不连续 Galerkin 方法。代码生成器通过自动调整方法选择此类中性能最佳的变体。该实现是在开源 PDE 软件框架 Dune 和离散化模块 dune-pdelab 内完成的。所提出方法的强度通过标量扩散反应方程和斯托克斯方程的 DG 方案的性能测量来说明。在我们的测量中，我们同时使用了 AVX2 和 AVX512 指令集，对于操作员的一个无矩阵应用程序，实现了机器理论峰值性能的 30% 到 40%。

更新日期：2020-12-09

点击分享查看原文

点击收藏

公开下载

阅读更多本刊最新论文本刊介绍/投稿指南11