当前位置: X-MOL 学术J. Supercomput. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Efficient parallelization of multilevel fast multipole algorithm for electromagnetic simulation on many-core SW26010 processor
The Journal of Supercomputing ( IF 2.5 ) Pub Date : 2020-05-19 , DOI: 10.1007/s11227-020-03308-9
Wei-Jia He , Ming-Lin Yang , Wu Wang , Xin-Qing Sheng

A many-core parallel approach of the multilevel fast multipole algorithm (MLFMA) based on the Athread parallel programming model is presented on the homegrown many-core SW26010 CPU of China. In the proposed many-core implementation of MLFMA, the data access efficiency is improved by using data structures based on the structure of array. The adaptive workload distribution strategies are adopted on different MLFMA tree levels to ensure full utilization of computing capability and the scratchpad memory. A double buffering scheme is specially designed to make communication overlapped computation. The resulting Athread-based many-core implementation of the MLFMA is capable of solving real-life problems with over one million unknowns with a remarkable speedup. The capability and efficiency of the proposed method are analyzed through the examples of computing scattering by spheres and a practical aerocraft. Numerical results show that with the proposed parallel scheme, the total speedup ratios from 6.4 to 8.0 can be achieved, compared with the CPU master core.

中文翻译:

多核SW26010处理器电磁仿真多级快速多极算法的高效并行化

在国产众核SW26010 CPU上提出了一种基于Athread并行编程模型的多级快速多极算法(MLFMA)的众核并行方法。在提出的 MLFMA 众核实现中,通过使用基于数组结构的数据结构来提高数据访问效率。在不同的 MLFMA 树级别上采用自适应工作负载分配策略,以确保充分利用计算能力和暂存器内存。双缓冲方案专门设计用于进行通信重叠计算。由此产生的基于 Athread 的 MLFMA 众核实现能够以显着的加速解决超过一百万个未知数的现实问题。通过计算球体散射和实际飞行器的实例分析了所提出方法的能力和效率。数值结果表明,与 CPU 主核相比,采用所提出的并行方案,总加速比可以达到 6.4 到 8.0。
更新日期:2020-05-19
down
wechat
bug