当前位置: X-MOL 学术Int. J. Numer. Meth. Eng. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
A thread‐block‐wise computational framework for large‐scale hierarchical continuum‐discrete modeling of granular media
International Journal for Numerical Methods in Engineering ( IF 2.9 ) Pub Date : 2020-10-07 , DOI: 10.1002/nme.6549
Shiwei Zhao 1 , Jidong Zhao 1 , Weijian Lianga 1
Affiliation  

This paper presents a novel, scalable parallel computing framework for large-scale and multiscale simulations of granular media. Key to the new framework is an innovative thread-blockwise representative volume element (RVE) parallelism, inspired by the resemblance between a typical multiscale computational hierarchy and the hierarchical thread structure of graphics processing units (GPUs). To solve a hierarchical multiscale problem, all computation in an RVE is assigned a single block of threads so that the RVE runs entirely on a GPU to avoid frequent data exchange with the host CPU. The thread blocks can meanwhile run in an asynchronization mode, which implicitly guarantees the independence of inter-RVE computation as featured by the hierarchical multiscale structure. The parallel computing algorithms are formulated and implemented in an in-house code, GoDEM, involving the GPU-specific techniques such as coalesced access, shared memory utilization and unified memory implementation. Benchmark and performance tests are conducted against an opensource CPU-based DEM code under three typical loading conditions. The performance of GoDEM is examined with varying thread-block size and register pressure of the GPU, and RVE number. It reveals that increasing GPU occupancy by decreasing register pressure results in a significant degradation rather than improvement in performance. We further demonstrate that the proposed GPU parallelism framework may achieve a saturated speedup of approximately 350 as compared to the single-CPU-core code. As a demonstration on its application for multiscale modeling of granular media, the material point method (MPM) is coupled with the new framework powered DEM to simulate a typical engineering-scale 1 A cc ep te d M an us cr ip t problem involving tens of millions of total particles having to be handled. It demonstrates that a speedup of approximately 91 can be achieved by using the proposed framework, as compared to the performance of a similar CPU program running on a cluster node of 44 parallel threads. The study offers a viable future solution to large-scale and multiscale modeling of granular media.

中文翻译:

用于颗粒介质大规模分层连续离散建模的线程块式计算框架

本文提出了一种新颖的、可扩展的并行计算框架,用于颗粒介质的大规模和多尺度模拟。新框架的关键是创新的线程块式代表体积元素 (RVE) 并行性,其灵感来自典型的多尺度计算层次结构与图形处理单元 (GPU) 的层次线程结构之间的相似性。为了解决分层多尺度问题,RVE 中的所有计算都分配了一个线程块,以便 RVE 完全在 GPU 上运行,以避免与主机 CPU 频繁交换数据。线程块可以同时以异步模式运行,这隐含地保证了分层多尺度结构特征的 RVE 间计算的独立性。并行计算算法在内部代码 GoDEM 中制定和实现,涉及 GPU 特定的技术,如合并访问、共享内存利用和统一内存实现。基准测试和性能测试是在三种典型负载条件下针对基于开源 CPU 的 DEM 代码进行的。GoDEM 的性能通过不同的线程块大小和 GPU 的寄存器压力以及 RVE 数量来检查。它表明通过降低寄存器压力来增加 GPU 占用率会导致性能显着下降而不是提高。我们进一步证明,与单 CPU 核代码相比,所提出的 GPU 并行框架可以实现大约 350 的饱和加速。作为其在颗粒介质多尺度建模中应用的演示,材料点法 (MPM) 与新框架驱动的 DEM 相结合,以模拟典型的工程规模 1 A cc ep te d M an us crip t 问题,涉及必须处理的总粒子数以千万计。它表明,与在 44 个并行线程的集群节点上运行的类似 CPU 程序的性能相比,使用所提出的框架可以实现大约 91 的加速。该研究为颗粒介质的大规模和多尺度建模提供了一个可行的未来解决方案。与在 44 个并行线程的集群节点上运行的类似 CPU 程序的性能相比。该研究为颗粒介质的大规模和多尺度建模提供了一个可行的未来解决方案。与在 44 个并行线程的集群节点上运行的类似 CPU 程序的性能相比。该研究为颗粒介质的大规模和多尺度建模提供了一个可行的未来解决方案。
更新日期:2020-10-07
down
wechat
bug