当前位置: X-MOL 学术Comput. Methods Appl. Mech. Eng. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
A GPU implementation of the PCG method for large-scale image-based finite element analysis in heterogeneous periodic media
Computer Methods in Applied Mechanics and Engineering ( IF 7.2 ) Pub Date : 2022-07-16 , DOI: 10.1016/j.cma.2022.115276
Pedro Cortez Fetter Lopes , André Maués Brabo Pereira , Esteban Walter Gonzalez Clua , Ricardo Leiderman

Image-based physics simulations in heterogeneous media at a microscopic scale are a growing trend in various fields within and around Scientific Computing. We consider the scope of Numerical Homogenization, where micro-computed tomography is used to obtain digital models of physical samples, naturally leading to pixel and voxel-based solutions. In this context, the Finite Element Method (FEM) is commonly employed to solve the governing differential equations, via a system of algebraic equations. As image dimensions increase, the memory allocation due to the matrix associated with the FEM quickly becomes unfeasible, even in sparse format. Assembly-free strategies are adopted to reduce memory usage, with the caveat of increased computational cost. The Preconditioned Conjugate Gradient (PCG) method is widely employed to solve this sort of large-scale sparse linear systems , and is fitting to be adapted for assembly-free implementations. This work focuses on a massively parallel PCG solver applied to finite element analyses of heat conduction and linear elasticity on image-based models. Memory-efficiency is one of our main concerns, in an attempt to make feasible the employment of personal-use GPUs for large-scale simulations. The resulting solver is validated with an analytical benchmark, and by comparing the obtained results for a microtomographic model of a cast iron sample against experimental values found in the literature. Time and memory metrics are presented and discussed. It is shown that the developed program allows for homogenization studies of nearly 500 million degrees-of-freedom to be conducted in personal computers equipped with CUDA-enabled devices of 8 GB RAM, taking seconds or a few minutes per system solution with the PCG method. Up to 400× speed-up was observed in comparison to an analogous solver running in a 16-thread CPU. Our GPU implementation makes it possible to conduct, in a matter of minutes, homogenization studies that would take hours, or even days, in personal CPUs.



中文翻译:

异质周期介质中基于图像的大规模有限元分析 PCG 方法的 GPU 实现

在微观尺度的异构介质中基于图像的物理模拟是科学计算内部和周围各个领域的增长趋势。我们考虑了数值均质化的范围,其中使用微计算机断层扫描来获取物理样本的数字模型,自然会产生基于像素和体素的解决方案。在这种情况下,有限元法 (FEM) 通常用于求解控制微分方程,通过代数方程组。随着图像尺寸的增加,由于与 FEM 相关的矩阵导致的内存分配很快变得不可行,即使在稀疏格式中也是如此。采用免组装策略来减少内存使用,但需要注意的是计算成本会增加。预条件共轭梯度 (PCG) 方法被广泛用于解决此类大规模稀疏线性系统,并且适用于免组装实现。这项工作的重点是应用于热传导和线性弹性有限元分析的大规模并行 PCG 求解器基于图像的模型。内存效率是我们的主要关注点之一,试图使个人使用的 GPU 用于大规模模拟变得可行。生成的求解器通过分析基准进行验证,并将获得的铸铁样品显微断层模型的结果与文献中的实验值进行比较。介绍并讨论了时间和内存指标。结果表明,开发的程序允许在配备支持 CUDA 的 8 GB RAM 设备的个人计算机上进行近 5 亿自由度的均质化研究,使用 PCG 方法的每个系统解决方案需要几秒钟或几分钟. 多达 400 个×与在 16 线程 CPU 中运行的类似求解器相比,观察到了加速。我们的 GPU 实现可以在几分钟内执行在个人 CPU 中需要数小时甚至数天的同质化研究。

更新日期:2022-07-19
down
wechat
bug