当前位置: X-MOL 学术Nat. Mach. Intell. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Finding the ground state of spin Hamiltonians with reinforcement learning
Nature Machine Intelligence ( IF 23.8 ) Pub Date : 2020-09-07 , DOI: 10.1038/s42256-020-0226-x
Kyle Mills , Pooya Ronagh , Isaac Tamblyn

Reinforcement learning (RL) has become a proven method for optimizing a procedure for which success has been defined, but the specific actions needed to achieve it have not. Using a method we call ‘controlled online optimization learning’ (COOL), we apply the so-called ‘black box’ method of RL to simulated annealing (SA), demonstrating that an RL agent based on proximal policy optimization can, through experience alone, arrive at a temperature schedule that surpasses the performance of standard heuristic temperature schedules for two classes of Hamiltonians. When the system is initialized at a cool temperature, the RL agent learns to heat the system to ‘melt’ it and then slowly cool it in an effort to anneal to the ground state; if the system is initialized at a high temperature, the algorithm immediately cools the system. We investigate the performance of our RL-driven SA agent in generalizing to all Hamiltonians of a specific class. When trained on random Hamiltonians of nearest-neighbour spin glasses, the RL agent is able to control the SA process for other Hamiltonians, reaching the ground state with a higher probability than a simple linear annealing schedule. Furthermore, the scaling performance (with respect to system size) of the RL approach is far more favourable, achieving a performance improvement of almost two orders of magnitude on L = 142 systems. We demonstrate the robustness of the RL approach when the system operates in a ‘destructive observation’ mode, an allusion to a quantum system where measurements destroy the state of the system. The success of the RL agent could have far-reaching impacts, from classical optimization, to quantum annealing and to the simulation of physical systems.

A preprint version of the article is available at ArXiv.


中文翻译:

通过强化学习找到自旋哈密顿量的基态

强化学习(RL)已成为一种用于优化已定义成功的过程的行之有效的方法,但是实现该过程所需的具体行动尚未实现。使用一种称为“受控在线优化学习”(COOL)的方法,我们将RL的所谓“黑匣子”方法应用于模拟退火(SA),这表明基于近端策略优化的RL代理可以仅凭经验就可以,得出的温度进度表超过了两类哈密顿量的标准启发式温度进度表的性能。当系统在低温下初始化时,RL代理会学习加热系统以使其“融化”,然后缓慢冷却以使其退火至基态。如果系统在高温下初始化,则算法会立即冷却系统。我们将RL驱动的SA代理的性能推广到特定类别的所有哈密顿主义者。当对最近邻旋转玻璃的随机哈密顿量进行训练时,RL代理能够控制其他哈密顿量的SA过程,比简单的线性退火程序更有可能达到基态。此外,RL方法的扩展性能(相对于系统大小)更为有利,在性能上实现了近两个数量级的改进。L = 14 2个系统。当系统在“破坏性观察”模式下运行时,我们证明了RL方法的鲁棒性,这是对量子系统的一种暗示,其中测量破坏了系统的状态。RL代理的成功可能会产生深远的影响,从经典的优化到量子退火再到物理系统的仿真。

该文章的预印本可从ArXiv获得。
更新日期:2020-09-08
down
wechat
bug