当前位置: X-MOL 学术Comput. Phys. Commun. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
GPGPU acceleration of all-electron electronic structure theory using localized numeric atom-centered basis functions
Computer Physics Communications ( IF 6.3 ) Pub Date : 2020-09-01 , DOI: 10.1016/j.cpc.2020.107314
William P. Huhn , Björn Lange , Victor Wen-zhe Yu , Mina Yoon , Volker Blum

We present an implementation of all-electron density-functional theory for massively parallel GPGPU-based platforms, using localized atom-centered basis functions and real-space integration grids. Special attention is paid to domain decomposition of the problem on non-uniform grids, which enables compute- and memory-parallel execution across thousands of nodes for real-space operations, e.g. the update of the electron density, the integration of the real-space Hamiltonian matrix, and calculation of Pulay forces. To assess the performance of our GPGPU implementation, we performed benchmarks on three different architectures using a 103-material test set. We find that operations which rely on dense serial linear algebra show dramatic speedups from GPGPU acceleration: in particular, SCF iterations including force and stress calculations exhibit speedups ranging from 4.5 to 6.6. For the architectures and problem types investigated here, this translates to an expected overall speedup between 3-4 for the entire calculation (including non-GPU accelerated parts), for problems featuring several tens to hundreds of atoms. Additional calculations for a 375-atom Bi$_2$Se$_3$ bilayer show that the present GPGPU strategy scales for large-scale distributed-parallel simulations.

中文翻译:

全电子电子结构理论的 GPGPU 加速使用局部数字原子中心基函数

我们使用局部原子中心基函数和实空间积分网格,为基于 GPGPU 的大规模并行平台提出了全电子密度泛函理论的实现。特别注意非均匀网格上问题的域分解,这使得计算和内存并行执行跨数千个节点进行实空间操作,例如电子密度的更新、实空间的整合哈密​​顿矩阵,以及普莱力的计算。为了评估我们的 GPGPU 实现的性能,我们使用 103 种材料测试集对三种不同的架构进行了基准测试。我们发现依赖于密集串行线性代数的操作显示出 GPGPU 加速的显着加速:特别是,包括力和应力计算在内的 SCF 迭代表现出的加速范围为 4.5 到 6.6。对于这里研究的架构和问题类型,对于具有数十到数百个原子的问题,这意味着整个计算(包括非 GPU 加速部分)的预期总体加速在 3-4 之间。对 375 个原子 Bi$_2$Se$_3$ 双层的额外计算表明,目前的 GPGPU 策略适用于大规模分布式并行模拟。
更新日期:2020-09-01
down
wechat
bug