当前位置: X-MOL 学术arXiv.cs.MS › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Hyperbolic Diffusion in Flux Reconstruction: Optimisation through Kernel Fusion within Tensor-Product Elements
arXiv - CS - Mathematical Software Pub Date : 2021-07-22 , DOI: arxiv-2107.14027
Will Trojak, Rob Watson, Freddie Witherden

Novel methods are presented for the fusion of GPU kernels in the artificial compressibility method (ACM), using tensor product elements and flux reconstruction. This is made possible through the hyperbolisation of the diffusion terms, which eliminates the expensive algorithmic steps needed to form the viscous stresses. Two fusion approaches are presented, which offer differing levels of parallelism. This is found to be necessary for the change in workload as the order of accuracy of the elements is increased. Several further optimisations of these approaches are demonstrated, including a generation time memory manager which maximises resource usage. The fused kernels are able to achieve 3-4 times speedup, which compares favourably with a theoretical maximum speedup of 4. In three dimensional test cases, the generated fused kernels are found to reduce total runtime by ${\sim}25\%$, and, when compared to the standard ACM formulation, simulations demonstrate that a speedup of $2.3$ times can be achieved.

中文翻译:

通量重建中的双曲扩散:通过张量积元素内的核融合进行优化

提出了使用张量积元素和通量重建在人工可压缩性方法 (ACM) 中融合 GPU 内核的新方法。这是通过扩散项的双曲线化实现的,这消除了形成粘性应力所需的昂贵算法步骤。提出了两种融合方法,它们提供不同级别的并行性。随着元素的精度顺序的增加,这对于工作量的变化来说是必要的。展示了这些方法的几个进一步优化,包括最大化资源使用的生成时间内存管理器。融合内核能够实现 3-4 倍的加速比,与理论最大加速比 4 相比毫不逊色。 在三维测试用例中,
更新日期:2021-07-30
down
wechat
bug