Development of element-by-element kernel algorithms in unstructured finite-element solvers for many-core wide-SIMD CPUs: Application to earthquake simulation,Journal of Computational Science

当前位置： X-MOL 学术 › Int. J. Comput. Sci. Eng. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Development of element-by-element kernel algorithms in unstructured finite-element solvers for many-core wide-SIMD CPUs: Application to earthquake simulation
Journal of Computational Science ( IF 3.3 ) Pub Date : 2020-06-26 , DOI: 10.1016/j.jocs.2020.101174
Kohei Fujita , Masashi Horikoshi , Tsuyoshi Ichimura , Larry Meadows , Kengo Nakajima , Muneo Hori , Lalith Maddegedara

Acceleration of the element-by-element (EBE) kernel in matrix-vector products is essential for high-performance in unstructured implicit finite-element applications. However, the EBE kernel is not straightforward to attain high performance due to random data access with data recurrence. In this paper, we develop methods to circumvent these data races for high performance on many-core CPU architectures with wide SIMD units. The developed EBE kernel attains 16.3% and 16.0% of FP32 peak on Intel Xeon Phi (Knights Landing) based Oakforest-PACS and Intel Xeon Platinum (Cascade Lake) based Oakbridge-CX, respectively. This leads to 2.88-fold speedup over the baseline kernel and 2.03-fold speedup of the whole finite-element application on Oakforest-PACS. Examples of finite-element earthquake simulations using the developed EBE kernel algorithms are shown. These insights are expected to enable high performance on other unstructured finite-element solvers on large-scale many-core wide-SIMD CPU based systems.

中文翻译：

多核宽SIMD CPU的非结构化有限元求解器中逐元素内核算法的开发：在地震模拟中的应用

矩阵向量乘积中逐元素（EBE）内核的加速对于非结构化隐式有限元素应用程序中的高性能至关重要。但是，由于具有数据重复性的随机数据访问，因此EBE内核并不容易直接获得高性能。在本文中，我们开发了规避这些数据争用的方法，以在具有宽SIMD单元的多核CPU架构上实现高性能。在基于Intel Xeon Phi（骑士登陆）的Oakforest-PACS和基于Intel Xeon Platinum（Cascade Lake）的Oakbridge-CX上，开发的EBE内核分别达到FP32峰值的16.3％和16.0％。这导致在基准内核上的速度提高了2.88倍，在Oakforest-PACS上整个有限元应用程序的速度提高了2.03倍。显示了使用开发的EBE内核算法进行有限元地震模拟的示例。

更新日期：2020-06-26

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南

全部期刊列表>>