当前位置: X-MOL 学术J. Comput. Phys. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
A hardware accelerated unstructured overset method to simulate turbulent fluid flow
Journal of Computational Physics ( IF 4.1 ) Pub Date : 2021-07-24 , DOI: 10.1016/j.jcp.2021.110574
Wyatt James Horne , Krishnan Mahesh

Hardware acceleration consists of offloading computational work to devices such as graphics processing units (GPUs) to produce overall speed-up. Algorithms and numerical methods must be constructed to suit the available hardware in order to effectively produce speed-up. In this work a numerical method is presented which can effectively use hardware acceleration to simulate incompressible turbulent fluid flow. The method is an unstructured overset method where unstructured meshes are attached to individual bodies and connected throughout the flow domain to produce a single domain solution through an overset assembly process. The unstructured overset method shown in Horne and Mahesh [1] and Horne and Mahesh [2] was found capable of scaling to O(105) computational cores for O(105) moving bodies in turbulent flow fields while producing accurate flow results. This highly scalable method is modified and extended to effectively utilize on-node hardware acceleration. Overset assembly algorithms which use hardware acceleration are presented based on successful accelerated algorithms in real-time ray tracing and computational geometry. Timing results for core overset assembly operations are presented showing a maximum O(100x) speedup when using hardware acceleration. A novel method for turbulent fluid flow is presented which utilizes over-decomposition of the flow domain to produce task-parallelism allowing asynchronous calculation of the different steps of the method while also providing overlap between data transfer and computation. A mixed precision solver is utilized which provides a balance between optimal performance and numerical accuracy. A cost effective and accurate artificial compressibility pressure regularization is used which has minimal memory complexity and minimizes computational cost while maintaining accuracy. A primal-dual Laplacian operator is introduced which produces accurate results on skewed meshes. Results for canonical flow cases with overset meshes are shown illustrating the method's accuracy and numerical properties. Substantial speed-up is demonstrated for the numerical method reaching upwards of 50 times as fast as the non-accelerated method for high cell loadings.



中文翻译:

一种模拟湍流流动的硬件加速非结构化叠加方法

硬件加速包括将计算工作卸载到图形处理单元 (GPU) 等设备上,以实现整体加速。必须构建适合可用硬件的算法和数值方法,以便有效地产生加速。在这项工作中,提出了一种数值方法,可以有效地使用硬件加速来模拟不可压缩的湍流流体流动。该方法是一种非结构化重叠方法,其中非结构化网格附加到单个实体并在整个流域中连接,以通过重叠组装过程生成单域解决方案。Horne 和 Mahesh [1] 以及 Horne 和 Mahesh [2] 中所示的非结构化重叠方法被发现能够扩展到 O(10 5 ) 计算核心,O(10 5) 在湍流流场中移动物体,同时产生准确的流动结果。这种高度可扩展的方法经过修改和扩展,以有效利用节点上的硬件加速。基于实时光线追踪和计算几何中成功的加速算法,介绍了使用硬件加速的重叠装配算法。显示了使用硬件加速时的最大 O(100x) 加速比的核心重叠组装操作的时序结果。提出了一种湍流流体流动的新方法,该方法利用流动域的过度分解来产生任务并行性,从而允许该方法的不同步骤的异步计算,同时还提供数据传输和计算之间的重叠。使用混合精度求解器,可在最佳性能和数值精度之间取得平衡。使用具有成本效益和准确的人工可压缩性压力正则化,其具有最小的内存复杂度和最小的计算成本,同时保持准确性。引入了原始对偶拉普拉斯算子,它可以在倾斜的网格上产生准确的结果。显示了具有重叠网格的典型流情况的结果,说明了该方法的准确性和数值属性。数值方法的显着加速被证明是高细胞负载的非加速方法的 50 倍以上。使用具有成本效益和准确的人工可压缩性压力正则化,其具有最小的内存复杂度和最小的计算成本,同时保持准确性。引入了原始对偶拉普拉斯算子,它可以在倾斜的网格上产生准确的结果。显示了具有重叠网格的典型流情况的结果,说明了该方法的准确性和数值属性。数值方法的显着加速被证明是高细胞负载的非加速方法的 50 倍以上。使用具有成本效益和准确的人工可压缩性压力正则化,其具有最小的内存复杂度和最小的计算成本,同时保持准确性。引入了原始对偶拉普拉斯算子,它可以在倾斜的网格上产生准确的结果。显示了具有重叠网格的典型流情况的结果,说明了该方法的准确性和数值属性。数值方法的显着加速被证明是高细胞负载的非加速方法的 50 倍以上。

更新日期:2021-08-01
down
wechat
bug