当前位置: X-MOL 学术Comput. Phys. Commun. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
A multi-GPU implementation of a full-field crystal plasticity solver for efficient modeling of high-resolution microstructures
Computer Physics Communications ( IF 7.2 ) Pub Date : 2020-09-01 , DOI: 10.1016/j.cpc.2020.107231
Adnan Eghtesad , Kai Germaschewski , Ricardo A. Lebensohn , Marko Knezevic

Abstract In a recent publication (Eghtesad et al., 2018), we have reported a message passing interface (MPI)-based domain decomposition parallel implementation of an elasto-viscoplastic fast Fourier transform-based (EVPFFT) micromechanical solver to facilitate computationally efficient crystal plasticity modeling of polycrystalline materials. In this paper, we present major extensions to the previously reported implementation to take advantage of graphics processing units (GPUs), which can perform floating point arithmetic operations much faster than traditional central processing units (CPUs). In particular, the applications are developed to utilize a single GPU and multiple GPUs from one computer as well as a large number of GPUs across nodes of a supercomputer. To this end, the implementation combines the OpenACC programming model for GPU acceleration with MPI for distributed computing. Moreover, the FFT calculations are performed using the efficient Compute Unified Device Architecture (CUDA) FFT library, called CUFFT. Finally, to maintain performance portability, OpenACC-CUDA interoperability for data transfers between CPU and GPUs is used. The overall implementations are termed ACC-EVPCUFFT for single GPU and MPI-ACC-EVPCUFFT for multiple GPUs. To facilitate performance evaluation studies of the developed computational framework, deformation of a single phase copper is simulated, while to further demonstrate utility of the implementation for resolving fine microstructures, deformation of a dual-phase steel DP590 is simulated. The implementations and results are presented and discussed in this paper.

中文翻译:

用于高效建模高分辨率微结构的全场晶体塑性求解器的多 GPU 实现

摘要 在最近的一份出版物中(Eghtesad 等人,2018 年),我们报告了一种基于消息传递接口 (MPI) 的域分解并行实现,该并行实现基于弹粘塑性快速傅立叶变换 (EVPFFT) 微机械求解器,以促进计算高效的晶体多晶材料的塑性建模。在本文中,我们展示了之前报告的实现的主要扩展,以利用图形处理单元 (GPU),它可以比传统中央处理单元 (CPU) 更快地执行浮点算术运算。特别是,开发应用程序以利用来自一台计算机的单个 GPU 和多个 GPU,以及跨超级计算机节点的大量 GPU。为此,该实现将用于 GPU 加速的 OpenACC 编程模型与用于分布式计算的 MPI 相结合。此外,FFT 计算是使用称为 CUFFT 的高效计算统一设备架构 (CUDA) FFT 库执行的。最后,为了保持性能的可移植性,使用了 OpenACC-CUDA 在 CPU 和 GPU 之间进行数据传输的互操作性。整体实现被称为单 GPU 的 ACC-EVPCUFFT 和多 GPU 的 MPI-ACC-EVPCUFFT。为了促进开发的计算框架的性能评估研究,模拟了单相铜的变形,同时为了进一步证明该实施在解析精细微观结构方面的实用性,模拟了双相钢 DP590 的变形。本文介绍并讨论了实现和结果。
更新日期:2020-09-01
down
wechat
bug