An efficient parallel algorithm for 3D magnetotelluric modeling with edge-based finite element,Computational Geosciences

当前位置： X-MOL 学术 › Comput. Geosci. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

An efficient parallel algorithm for 3D magnetotelluric modeling with edge-based finite element
Computational Geosciences ( IF 2.5 ) Pub Date : 2020-07-06 , DOI: 10.1007/s10596-020-09976-z
Xiaoxiong Zhu , Jie Liu , Yian Cui , Chunye Gong

Three-dimensional magnetotelluric modeling algorithm of high accuracy and high efficiency is required for data interpretation and inversion. In this paper, edge-based finite element method with unstructured mesh is used to solve 3D magnetotelluric problem. Two boundary conditions—Dirichlet boundary condition and Neumann boundary condition—are set for cross-validation and comparison. We propose an efficient parallel algorithm to speed up computation and improve efficiency. The algorithm is based on distributed matrix storage and has three levels of parallelism. The first two are process level parallelization for frequencies and matrix solving, and the last is thread-level parallelization for loop unrolling. The algorithm is validated by several model studies. Scalability tests have been performed on two distributed-memory HPC platforms, one consists of Intel Xeon E5-2660 microprocessors and the other consists of Phytium FT2000 Plus microprocessors. On Intel platform, computation time of our algorithm solving Dublin Test Model-1 with 3,756,373 edges at 21 frequencies is 365 s on 2520 cores. The speedup and efficiency are 1609 and 60% compared to 100 cores. On Phytium platform, scalability test shows that the speedup from 256 cores to 86,016 cores has been increased to 11,255.

中文翻译：

基于边缘有限元的3D大地电磁建模的高效并行算法

数据解释和反演需要高精度，高效率的三维大地电磁建模算法。本文采用基于边缘的非结构化网格有限元方法来解决3D大地电磁问题。为交叉验证和比较设置了两个边界条件-Dirichlet边界条件和Neumann边界条件。我们提出了一种高效的并行算法，以加快计算速度并提高效率。该算法基于分布式矩阵存储，并具有三个并行级别。前两个是用于频率和矩阵求解的过程级并行化，最后一个是用于循环展开的线程级并行化。通过若干模型研究验证了该算法。在两个分布式内存HPC平台上进行了可扩展性测试，一个由Intel Xeon E5-2660微处理器组成，另一个由Phytium FT2000 Plus微处理器组成。在Intel平台上，在2520个内核上，求解都柏林测试模型1（在21个频率下具有3,756,373个边）的算法的计算时间为365 s。与100个内核相比，提速和效率分别为1609和60％。在Phytium平台上，可扩展性测试表明，从256核到86,016核的加速已提高到11,255。

更新日期：2020-07-06

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南

全部期刊列表>>