当前位置: X-MOL 学术IEEE Trans. Parallel Distrib. Syst. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Optimization of Reactive Force Field Simulation: Refactor, Parallelization, and Vectorization for Interactions
IEEE Transactions on Parallel and Distributed Systems ( IF 5.6 ) Pub Date : 2021-06-22 , DOI: 10.1109/tpds.2021.3091408
Ping Gao , Xiaohui Duan , Bertil Schmidt , Wusheng Zhang , Lin Gan , Haohuan Fu , Wei Xue , Weiguo Liu , Guangwen Yang

Molecular dynamics (MD) simulations are playing an increasingly important role in many areas ranging from chemical materials to biological molecules. With the continuing development of MD models, the potentials are getting larger and more complex. In this article, we focus on the reactive force field (ReaxFF) potential from LAMMPS to optimize the computation of interactions. We present our efforts on refactoring for neighbor list building, bond order computation, as well as valence angles and torsion angles computation. After redesigning these kernels, we develop a vectorized implementation for non-bonded interactions, which is nearly 100 × faster than the management processing element (MPE) on the Sunway TaihuLight supercomputer. Furthermore, we have implemented the three-body-list free torsion angles computation, and propose a line-locked software cache method to eliminate write conflicts in the torsion angle and valence angle interactions resulting in an order-of-magnitude speedup on a single Sunway TaihuLight node. In addition, we achieve a speedup of up to 3.5 compared to the KOKKOS package on an Intel Xeon Gold 6148 core. When executed on 1,024 processes, our implementation enables the simulation of 21,233,664 atoms on 66,560 cores with a performance of 0.032 ns/day and a weak scaling efficiency of 95.71 percent.

中文翻译:


反作用力场仿真的优化:交互的重构、并行化和矢量化



分子动力学 (MD) 模拟在从化学材料到生物分子的许多领域中发挥着越来越重要的作用。随着MD模型的不断发展,潜力变得越来越大、越来越复杂。在本文中,我们重点关注 LAMMPS 的反作用力场 (ReaxFF) 潜力,以优化相互作用的计算。我们展示了我们在重构邻居列表构建、键序计算以及价角和扭转角计算方面所做的努力。重新设计这些内核后,我们开发了一种用于非键交互的矢量化实现,其速度比神威·太湖之光超级计算机上的管理处理元件(MPE)快近 100 倍。此外,我们还实现了三体列表自由扭转角计算,并提出了一种行锁定软件缓存方法来消除扭转角和价角相互作用中的写入冲突,从而在单个Sunway上实现数量级的加速太湖之光节点。此外,与 Intel Xeon Gold 6148 内核上的 KOKKOS 封装相比,我们实现了高达 3.5 的加速。当在 1,024 个进程上执行时,我们的实现可以在 66,560 个内核上模拟 21,233,664 个原子,性能为 0.032 ns/天,微弱扩展效率为 95.71%。
更新日期:2021-06-22
down
wechat
bug