当前位置: X-MOL 学术Comput. Phys. Commun. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Accelerated impurity solver for DMFT and its diagrammatic extensions
Computer Physics Communications ( IF 6.3 ) Pub Date : 2021-06-18 , DOI: 10.1016/j.cpc.2021.108075
Corey Melnick , Patrick Sémon , Kwangmin Yu , Nicholas D'Imperio , André-Marie Tremblay , Gabriel Kotliar

We present ComCTQMC, a GPU accelerated quantum impurity solver. It uses the continuous-time quantum Monte Carlo (CTQMC) algorithm wherein the partition function is expanded in terms of the hybridisation function (CT-HYB). ComCTQMC supports both partition and worm-space measurements, and it uses improved estimators and the reduced density matrix to improve observable measurements whenever possible. ComCTQMC efficiently measures all one and two-particle Green's functions, all static observables which commute with the local Hamiltonian, and the occupation of each impurity orbital. ComCTQMC can solve complex-valued impurities with crystal fields that are hybridized to both fermionic and bosonic baths. Most importantly, ComCTQMC utilizes graphical processing units (GPUs), if available, to dramatically accelerate the CTQMC algorithm when the Hilbert space is sufficiently large. We demonstrate acceleration by a factor of over 600 (100) in a simulation of δ-Pu at 600 K with (without) crystal fields. In easier problems, the GPU offers less impressive acceleration or even decelerates the CTQMC. Here we describe the theory, algorithms, and structure used by ComCTQMC in order to achieve this set of features and level of acceleration.

Program summary

Program Title: ComCTQMC

CPC Library link to program files: https://doi.org/10.17632/x2gzgm8njh.1

Licensing provisions: GPLv3

Programming language: C++/CUDA

Nature of problem: In dynamical mean-field theory (DMFT), the computational bottleneck is the repeated solution of a quantum impurity problem [1]. The continuous-time quantum Monte-Carlo (CTQMC) algorithm has emerged as one of the most efficient methods for solving multiorbital impurity problems at moderate-to-high temperatures [2]. However, the low-temperature regime remains inaccessible, particularly for f-shell systems, and the measurement of two-particle correlation functions on an impurity adds a substantial computational burden. The bottleneck of the CTQMC solver is itself the computation of the local trace which includes the multiplication of many moderate-to-large sized matrices. The efficient solution of the impurity, measurement of the two-particle correlation functions, and acceleration of the trace computation are therefore critical.

Solution method: ComCTQMC uses the hybridisation expansion of the impurity action to explore partition space [3]. It uses the worm algorithm [4] to explore the union of the partition space with observables spaces, e.g., the two-particle correlation functions. It uses improved estimators to more accurately measure the one- and two-particle Green's functions [5]. Identical impurities are solved across all MPI ranks (for ideal weak scaling) and the trace computations of these impurities are distributed to and accelerated by GPUs (when available). The lazy-trace algorithm [6] is used to further reduce the burden of the local trace calculation.

Additional comments including restrictions and unusual features: ComCTQMC solves nearly arbitrary impurities, including those with complex valued and time-dependent interactions. However, there are two restrictions: (1) The retarded part of the interaction is described by a set of bilinears (a paired creation and annihilation operator), and these bilinears must commute with the local Hamiltonian and have real quantum numbers; (2) If a local Green's function vanishes, then the corresponding hybridisation function also vanishes.

References

[1]

A. Georges, G. Kotliar, W. Krauth, M.J. Rozenberg, Dynamical mean-field theory of strongly correlated fermion systems and the limit of infinite dimensions, Rev. Mod. Phys. 68 (1996) 13.

[2]

G. Kotliar, S. Y. Savrasov, K. Haule, V.S. Oudovenko, O. Parcollet, C.A. Marianetti, Electronic structure calculations with dynamical mean-field theory, Rev. Mod. Phys. 865 (2006) 78.

[3]

E. Gull, A.J. Millis, A.I. Lichtenstein, A.N. Rubtsov, M. Troyer, P. Werner, Continuous-time Monte Carlo methods for quantum impurity models, Rev. Mod. Phys. 83 (2011) 349.

[4]

P. Gunacker, M. Wallerberger, E. Gull, A. Hausoel, G. Sangiovanni, K. Held, Continuous-time quantum Monte Carlo using worm sampling, Phys. Rev. B 92 (2015) 155102.

[5]

H. Hafermann, K. R. Patton, P. Werner, Improved estimators for the self-energy and vertex function in hybridization-expansion continuous-time quantum Monte Carlo simulations, Phys. Rev. B 85 (2012) 205106.

[6]

P. Sémon, C.-H. Yee, K. Haule, A-MS Tremblay, Lazy skip-lists: An algorithm for fast hybridization-expansion quantum Monte Carlo, Phys. Rev. B 90 (2014) 075149.



中文翻译:

用于 DMFT 的加速杂质求解器及其图解扩展

我们展示了 ComCTQMC,一种 GPU 加速的量子杂质求解器。它使用连续时间量子蒙特卡罗 (CTQMC) 算法,其中分配函数根据杂交函数 (CT-HYB) 展开。ComCTQMC 支持分区和蠕虫空间测量,并且它使用改进的估计器和减少的密度矩阵尽可能改进可观察的测量。ComCTQMC 有效地测量所有一粒子和二粒子格林函数,所有与局部哈密​​顿量交换的静态可观测量,以及每个杂质轨道的占据。ComCTQMC 可以通过与费米子和玻色子浴混合的晶体场来解决复值杂质。最重要的是,ComCTQMC 利用图形处理单元 (GPU)(如果可用)在Hilbert 空间足够大时显着加速 CTQMC 算法。我们在 600 K 下用(无)晶体场模拟δ-Pu 时证明了加速度超过 600 (100)倍。在较简单的问题中,GPU 提供的加速效果较差,甚至会使 CTQMC 减速。在这里,我们描述了 ComCTQMC 使用的理论、算法和结构,以实现这组功能和加速级别。

程序概要

计划名称: ComCTQMC

CPC 库程序文件链接: https : //doi.org/10.17632/x2gzgm8njh.1

许可条款: GPLv3

编程语言: C++/CUDA

问题性质:在动态平均场理论 (DMFT) 中,计算瓶颈是量子杂质问题的重复求解 [1]。连续时间量子蒙特卡罗 (CTQMC) 算法已成为解决中高温下多轨道杂质问题的最有效方法之一 [2]。然而,低温状态仍然无法进入,特别是对于f壳系统,以及对杂质的双粒子相关函数的测量增加了大量的计算负担。CTQMC 求解器的瓶颈本身就是局部轨迹的计算,其中包括许多中到大型矩阵的乘法。因此,杂质的有效解决方案、两粒子相关函数的测量以及轨迹计算的加速是至关重要的。

解决方法:ComCTQMC 使用杂质作用的杂交扩展来探索分区空间[3]。它使用蠕虫算法[4]来探索分区空间与可观察空间的联合,例如,两粒子相关函数。它使用改进的估计器来更准确地测量一粒子和二粒子格林函数 [5]。在所有 MPI 等级中解决相同的杂质(对于理想的弱缩放),并且这些杂质的跟踪计算被分配到 GPU 并由 GPU(如果可用)加速。惰性跟踪算法[6]用于进一步减轻本地跟踪计算的负担。

包括限制和异常特征在内的其他评论:ComCTQMC 解决了几乎任意的杂质,包括那些具有复杂值和时间相关相互作用的杂质。但是,有两个限制:(1)相互作用的延迟部分由一组双线性(成对的创建和湮灭算子)描述,并且这些双线性必须与局部哈密顿量对易并具有实量子数;(2) 如果局部格林函数消失,则对应的杂交函数也消失。

参考

[1]

A. Georges、G. Kotliar、W. Krauth、MJ Rozenberg,强相关费米子系统的动态平均场理论和无限维数的极限,Rev. Mod。物理。68 (1996) 13.

[2]

G. Kotliar、SY Savrasov、K. Haule、VS Oudovenko、O. Parcollet、CA Marianetti,使用动态平均场理论进行电子结构计算,Rev. Mod。物理。865 (2006) 78。

[3]

E. Gull、AJ Millis、AI Lichtenstein、AN Rubtsov、M. Troyer、P. Werner,用于量子杂质模型的连续时间蒙特卡罗方法,Rev. Mod。物理。83 (2011) 349。

[4]

P. Gunacker、M. Wallerberger、E. Gull、A. Hausoel、G. Sangiovanni、K. Held,使用蠕虫采样的连续时间量子蒙特卡罗,Phys. 修订版 B 92 (2015) 155102。

[5]

H. Hafermann、KR Patton、P. Werner,混合扩展连续时间量子蒙特卡罗模拟中自能和顶点函数的改进估计器,物理。修订版 B 85 (2012) 205106。

[6]

P. Sémon, C.-H. Yee, K. Haule, A-MS Tremblay, Lazy skip-lists: An algorithm for fast hybridization-expanquant Monte Carlo, Phys. 修订版 B 90 (2014) 075149。

更新日期:2021-06-29
down
wechat
bug