当前位置:
X-MOL 学术
›
arXiv.cs.PF
›
论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
On the performance of GPU accelerated q-LSKUM based meshfree solvers in Fortran, C++, Python, and Julia
arXiv - CS - Performance Pub Date : 2021-08-16 , DOI: arxiv-2108.07031 Nischay Ram Mamidi, Kumar Prasun, Dhruv Saxena, Anil Nemili, Bharatkumar Sharma, S. M. Deshpande
arXiv - CS - Performance Pub Date : 2021-08-16 , DOI: arxiv-2108.07031 Nischay Ram Mamidi, Kumar Prasun, Dhruv Saxena, Anil Nemili, Bharatkumar Sharma, S. M. Deshpande
This report presents a comprehensive analysis of the performance of GPU
accelerated meshfree CFD solvers for two-dimensional compressible flows in
Fortran, C++, Python, and Julia. The programming model CUDA is used to develop
the GPU codes. The meshfree solver is based on the least squares kinetic upwind
method with entropy variables (q-LSKUM). To assess the computational efficiency
of the GPU solvers and to compare their relative performance, benchmark
calculations are performed on seven levels of point distribution. To analyse
the difference in their run-times, the computationally intensive kernel is
profiled. Various performance metrics are investigated from the profiled data
to determine the cause of observed variation in run-times. To address some of
the performance related issues, various optimisation strategies are employed.
The optimised GPU codes are compared with the naive codes, and conclusions are
drawn from their performance.
中文翻译:
基于 GPU 加速 q-LSKUM 的无网格求解器在 Fortran、C++、Python 和 Julia 中的性能
本报告对 Fortran、C++、Python 和 Julia 中用于二维可压缩流的 GPU 加速无网格 CFD 求解器的性能进行了全面分析。编程模型 CUDA 用于开发 GPU 代码。无网格求解器基于具有熵变量的最小二乘动力学逆风方法 (q-LSKUM)。为了评估 GPU 求解器的计算效率并比较它们的相对性能,在七个点分布级别上执行基准计算。为了分析它们运行时间的差异,对计算密集型内核进行了分析。从分析数据中调查各种性能指标,以确定观察到的运行时间变化的原因。为了解决一些与性能相关的问题,采用了各种优化策略。
更新日期:2021-08-17
中文翻译:
基于 GPU 加速 q-LSKUM 的无网格求解器在 Fortran、C++、Python 和 Julia 中的性能
本报告对 Fortran、C++、Python 和 Julia 中用于二维可压缩流的 GPU 加速无网格 CFD 求解器的性能进行了全面分析。编程模型 CUDA 用于开发 GPU 代码。无网格求解器基于具有熵变量的最小二乘动力学逆风方法 (q-LSKUM)。为了评估 GPU 求解器的计算效率并比较它们的相对性能,在七个点分布级别上执行基准计算。为了分析它们运行时间的差异,对计算密集型内核进行了分析。从分析数据中调查各种性能指标,以确定观察到的运行时间变化的原因。为了解决一些与性能相关的问题,采用了各种优化策略。