当前位置: X-MOL 学术Parallel Comput. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Accelerated molecular dynamics simulation of Silicon Crystals on TaihuLight using OpenACC
Parallel Computing ( IF 1.4 ) Pub Date : 2020-07-11 , DOI: 10.1016/j.parco.2020.102667
Jianguo Liang , Rong Hua , Hao Zhang , Wenqiang Zhu , You Fu

The Sunway TaihuLight with the theoretical peak performance of 125PFlop/s is now ranked third in the TOP500 list. It provides a high-level programming model named OpenACC, which extends the OpenACC 2.0 standard with some customized extensions. We assess the performance of the extended programming model and the SW26010 heterogeneous many-core processor for running molecular dynamics (MD) simulation of solid covalent crystals using many-body potentials, such as the Tersoff potentials. Considering the special architecture of the SW26010 processor, we implement the porting of the MD simulation of silicon crystals using the Sunway OpenACC under the guidance of the extended Amdahl’s law. Since the Sunway OpenACC compiler cannot deal with the performance bottleneck of the MD simulation of silicon crystals, we implement two primary optimizations including designing the software cache and minimizing the access frequency of the main memory on an intermediate version of the code generated by the compiler. Experimental results indicate that a single-process many-core speedup of 12.89x can be achieved by using manual optimization strategies. Compared with the execution time of the serial version on Intel (R) Xeon (R) CPU E5-2620 v4 processor, 8.7x speedup can be achieved.



中文翻译:

使用OpenACC在TaihuLight上加速硅晶体的分子动力学模拟

目前,理论峰值性能为125PFlop / s的双威TaihuLight在TOP500列表中排名第三。它提供了一个名为OpenACC的高级编程模型,该模型通过一些自定义扩展扩展了OpenACC 2.0标准。我们评估了扩展编程模型和SW26010异质多核处理器的性能,该处理器用于使用多体势(例如Tersoff势)运行固态共价晶体的分子动力学(MD)模拟。考虑到SW26010处理器的特殊体系结构,我们在扩展的阿姆达尔定律的指导下使用Sunway OpenACC实现了硅晶体MD仿真的移植。由于Sunway OpenACC编译器无法处理硅晶体MD仿真的性能瓶颈,我们实现了两个主要的优化,包括设计软件缓存和在编译器生成的代码的中间版本上最小化主存储器的访问频率。实验结果表明,使用手动优化策略可以实现12.89x的单进程多核加速。与英特尔(R)至强(R)CPU E5-2620 v4处理器上串行版本的执行时间相比,可以实现8.7倍的加速。

更新日期:2020-07-11
down
wechat
bug