当前位置: X-MOL 学术Int. J. High Perform. Comput. Appl. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Accelerating and tuning small matrix multiplications on Sunway TaihuLight: A case study of spectral element CFD Code Nek5000
The International Journal of High Performance Computing Applications ( IF 3.5 ) Pub Date : 2019-10-09 , DOI: 10.1177/1094342019882246
Xianmeng Wang 1 , Zhifeng Zhou 1 , Changjun Hu 1 , Wen Yang 2 , Minfu Zhao 2 , Zhaoshun Wang 1 , Peng Shi 3
Affiliation  

The matrix–matrix products for matrices of small size have continued to play an important part in a range of scientific applications. The heterogeneous architecture, which is predicted to be a trend in the exascale supercomputing era, gives rises to the challenges of porting and optimizing small matrix products. We present a method to accelerating and tune small matrix multiplications on Sunway TaihuLight supercomputer, which has been titled as the most powerful supercomputer four times in the Top5000 list. Sunway TaihuLight is equipped with Shen-Wei hybrid manycore processors. We use Nek5000 as a case study to demonstrate our methods. Nek5000 is an open-source computational fluid dynamics (CFD) solver based on the spectral element method (SEM) for incompressible flow. The high-order SEM method, of which the computation kernel is small dense matrix products, is regarded to have the potential to overcome constraints of standard CFD software. By optimizing using vectorization, we gained about 30% performance improvement on management processing element. We accelerated Nek5000 using computing processing elements (CPEs). The experiments results suggest that employing 32 CPEs delivers the best performance enhancements. We scaled Nek5000 to 16,384 core groups with 540,672 cores, reaching about 30% performance improvements.

中文翻译:

在 Sunway TaihuLight 上加速和调整小矩阵乘法:光谱元素 CFD 代码 Nek5000 的案例研究

用于小尺寸矩阵的矩阵-矩阵产品继续在一系列科学应用中发挥重要作用。异构架构被预测为百亿亿级超级计算时代的趋势,带来了移植和优化小矩阵产品的挑战。我们提出了一种在 Sunway TaihuLight 超级计算机上加速和调整小矩阵乘法的方法,该超级计算机在 Top5000 榜单中四次被评为最强大的超级计算机。神威太湖之光搭载神威混合众核处理器。我们使用 Nek5000 作为案例研究来演示我们的方法。Nek5000 是一种开源计算流体动力学 (CFD) 求解器,基于不可压缩流的谱元法 (SEM)。高阶SEM方法,其中计算内核是小密集矩阵乘积,被认为具有克服标准 CFD 软件约束的潜力。通过使用矢量化进行优化,我们在管理处理元素上获得了大约 30% 的性能提升。我们使用计算处理元素 (CPE) 加速了 Nek5000。实验结果表明,采用 32 个 CPE 可提供最佳的性能增强。我们将 Nek5000 扩展到 16,384 个核心组,其中包含 540,672 个核心,性能提升约 30%。实验结果表明,采用 32 个 CPE 可提供最佳的性能增强。我们将 Nek5000 扩展到 16,384 个核心组,其中包含 540,672 个核心,性能提升约 30%。实验结果表明,采用 32 个 CPE 可提供最佳的性能增强。我们将 Nek5000 扩展到 16,384 个核心组,其中包含 540,672 个核心,性能提升约 30%。
更新日期:2019-10-09
down
wechat
bug