当前位置: X-MOL 学术arXiv.cs.PF › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Integrating State of the Art Compute, Communication, and Autotuning Strategies to Multiply the Performance of the Application Programm CPMD for Ab Initio Molecular Dynamics Simulations
arXiv - CS - Performance Pub Date : 2020-03-18 , DOI: arxiv-2003.08477
Tobias Kl\"offel, Gerald Mathias, Bernd Meyer

We present our recent code modernizations of the of the ab initio molecular dynamics program CPMD (www.cpmd.org) with a special focus on the ultra-soft pseudopotential (USPP) code path. Following the internal instrumentation of CPMD, all time critical routines have been revised to maximize the computational throughput and to minimize the communication overhead for optimal performance. Throughout the program missing hybrid MPI+OpenMP parallelization has been added to optimize scaling. For communication intensive routines, as the multiple distributed 3d FFTs of the electronic states and distributed matrix-matrix multiplications related to the $\beta$-projectors of the pseudopotentials, this MPI+OpenMP parallelization now overlaps computation and communication. The necessary partitioning of the workload is optimized by an auto-tuning algorithm. In addition, the largest global MPI_Allreduce operation has been replaced by highly tuned node-local parallelized operations using MPI shared-memory windows to avoid inter-node communication. A batched algorithm for the multiple 3d FFTs improves the throughput of the MPI_Alltoall communication and, thus, the scalability of the implementation, both for USPP and for the frequently used norm-conserving pseudopotential code path. The enhanced performance and scalability is demonstrated on a mid-sized benchmark system of 256 water molecules and further water systems of from 32 up to 2048 molecules.

中文翻译:

集成最先进的计算、通信和自动调整策略,以提高应用程序 CPMD 的性能,用于 Ab Initio 分子动力学模拟

我们展示了我们最近对 ab initio 分子动力学程序 CPMD (www.cpmd.org) 的代码现代化,特别关注超软赝势 (USPP) 代码路径。在 CPMD 的内部检测之后,所有时间关键的例程都经过修订,以最大限度地提高计算吞吐量并最大限度地减少通信开销以获得最佳性能。在整个程序中缺少混合 MPI+OpenMP 并行化已被添加以优化扩展。对于通信密集型例程,由于电子状态的多个分布式 3d FFT 和与赝势的 $\beta$-projectors 相关的分布式矩阵-矩阵乘法,这种 MPI+OpenMP 并行化现在重叠计算和通信。工作负载的必要分区通过自动调整算法进行优化。此外,最大的全局 MPI_Allreduce 操作已被高度调整的节点本地并行操作取代,使用 MPI 共享内存窗口以避免节点间通信。多个 3d FFT 的批处理算法提高了 MPI_Alltoall 通信的吞吐量,从而提高了实现的可扩展性,无论是 USPP 还是常用的规范守恒赝势代码路径。增强的性能和可扩展性在 256 个水分子的中型基准系统和 32 到 2048 个分子的进一步水系统上得到了证明。多个 3d FFT 的批处理算法提高了 MPI_Alltoall 通信的吞吐量,从而提高了实现的可扩展性,无论是 USPP 还是常用的规范守恒赝势代码路径。增强的性能和可扩展性在 256 个水分子的中型基准系统和 32 到 2048 个分子的进一步水系统上得到了证明。多个 3d FFT 的批处理算法提高了 MPI_Alltoall 通信的吞吐量,从而提高了实现的可扩展性,无论是 USPP 还是常用的规范守恒赝势代码路径。增强的性能和可扩展性在 256 个水分子的中型基准系统和 32 到 2048 个分子的进一步水系统上得到了证明。
更新日期:2020-03-20
down
wechat
bug