Heterogeneous Parallelization and Acceleration of Molecular Dynamics Simulations in GROMACS,arXiv - CS - Performance

当前位置： X-MOL 学术 › arXiv.cs.PF › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Heterogeneous Parallelization and Acceleration of Molecular Dynamics Simulations in GROMACS
arXiv - CS - Performance Pub Date : 2020-06-16 , DOI: arxiv-2006.09167
Szil\'ard P\'all, Artem Zhmurov, Paul Bauer, Mark Abraham, Magnus Lundborg, Alan Gray, Berk Hess, Erik Lindahl

The introduction of accelerator devices such as graphics processing units (GPUs) has had profound impact on molecular dynamics simulations and has enabled order-of-magnitude performance advances using commodity hardware. To fully reap these benefits, it has been necessary to reformulate some of the most fundamental algorithms, including the Verlet list, pair searching and cut-offs. Here, we present the heterogeneous parallelization and acceleration design of molecular dynamics implemented in the GROMACS codebase over the last decade. The setup involves a general cluster-based approach to pair lists and non-bonded pair interactions that utilizes both GPUs and CPU SIMD acceleration efficiently, including the ability to load-balance tasks between CPUs and GPUs. The algorithm work efficiency is tuned for each type of hardware, and to use accelerators more efficiently we introduce dual pair lists with rolling pruning updates. Combined with new direct GPU-GPU communication as well as GPU integration, this enables excellent performance from single GPU simulations through strong scaling across multiple GPUs and efficient multi-node parallelization.

中文翻译：

GROMACS 中分子动力学模拟的异构并行化和加速

图形处理单元 (GPU) 等加速器设备的引入对分子动力学模拟产生了深远的影响，并使用商品硬件实现了数量级的性能提升。为了充分获得这些好处，有必要重新制定一些最基本的算法，包括 Verlet 列表、配对搜索和截止。在这里，我们介绍了过去十年在 GROMACS 代码库中实现的分子动力学异构并行化和加速设计。该设置涉及对列表和非绑定对交互的基于集群的通用方法，该方法有效地利用 GPU 和 CPU SIMD 加速，包括在 CPU 和 GPU 之间负载平衡任务的能力。算法工作效率针对每种类型的硬件进行了调整，为了更有效地使用加速器，我们引入了带有滚动修剪更新的双对列表。结合新的直接 GPU-GPU 通信以及 GPU 集成，这通过跨多个 GPU 的强大扩展和高效的多节点并行化实现了单 GPU 模拟的卓越性能。

更新日期：2020-10-28

点击分享查看原文

点击收藏

阅读更多本刊最新论文

全部期刊列表>>