当前位置: X-MOL 学术Knowl. Eng. Rev. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Learning self-play agents for combinatorial optimization problems
The Knowledge Engineering Review ( IF 2.1 ) Pub Date : 2020-03-23 , DOI: 10.1017/s026988892000020x
Ruiyang Xu , Karl Lieberherr

Recent progress in reinforcement learning (RL) using self-play has shown remarkable performance with several board games (e.g., Chess and Go) and video games (e.g., Atari games and Dota2). It is plausible to hypothesize that RL, starting from zero knowledge, might be able to gradually approach a winning strategy after a certain amount of training. In this paper, we explore neural Monte Carlo Tree Search (neural MCTS), an RL algorithm that has been applied successfully by DeepMind to play Go and Chess at a superhuman level. We try to leverage the computational power of neural MCTS to solve a class of combinatorial optimization problems. Following the idea of Hintikka’s Game-Theoretical Semantics, we propose the Zermelo Gamification to transform specific combinatorial optimization problems into Zermelo games whose winning strategies correspond to the solutions of the original optimization problems. A specially designed neural MCTS algorithm is then introduced to train Zermelo game agents. We use a prototype problem for which the ground-truth policy is efficiently computable to demonstrate that neural MCTS is promising.

中文翻译:

学习组合优化问题的自我游戏代理

最近在使用自我对弈的强化学习 (RL) 方面取得的进展已在几种棋盘游戏(例如国际象棋和围棋)和视频游戏(例如 Atari 游戏和 Dota2)中显示出显着的性能。假设强化学习从零知识开始,经过一定量的训练,或许能够逐渐接近制胜策略。在本文中,我们探索了神经蒙特卡罗树搜索(neural MCTS),这是一种 RL 算法,已被 DeepMind 成功应用于以超人类水平下围棋和国际象棋。我们尝试利用神经 MCTS 的计算能力来解决一类组合优化问题。遵循 Hintikka 的博弈论语义的思想,我们提出了 Zermelo 游戏化,将特定的组合优化问题转化为 Zermelo 游戏,其获胜策略对应于原始优化问题的解决方案。然后引入了一种专门设计的神经 MCTS 算法来训练 Zermelo 游戏代理。我们使用一个原型问题,该问题的真实策略可以有效地计算来证明神经 MCTS 是有前途的。
更新日期:2020-03-23
down
wechat
bug