当前位置: X-MOL 学术Nature › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Mastering Atari, Go, chess and shogi by planning with a learned model
Nature ( IF 50.5 ) Pub Date : 2020-12-23 , DOI: 10.1038/s41586-020-03051-4
Julian Schrittwieser 1 , Ioannis Antonoglou 1, 2 , Thomas Hubert 1 , Karen Simonyan 1 , Laurent Sifre 1 , Simon Schmitt 1 , Arthur Guez 1 , Edward Lockhart 1 , Demis Hassabis 1 , Thore Graepel 1, 2 , Timothy Lillicrap 1 , David Silver 1, 2
Affiliation  

Constructing agents with planning capabilities has long been one of the main challenges in the pursuit of artificial intelligence. Tree-based planning methods have enjoyed huge success in challenging domains, such as chess1 and Go2, where a perfect simulator is available. However, in real-world problems, the dynamics governing the environment are often complex and unknown. Here we present the MuZero algorithm, which, by combining a tree-based search with a learned model, achieves superhuman performance in a range of challenging and visually complex domains, without any knowledge of their underlying dynamics. The MuZero algorithm learns an iterable model that produces predictions relevant to planning: the action-selection policy, the value function and the reward. When evaluated on 57 different Atari games3-the canonical video game environment for testing artificial intelligence techniques, in which model-based planning approaches have historically struggled4-the MuZero algorithm achieved state-of-the-art performance. When evaluated on Go, chess and shogi-canonical environments for high-performance planning-the MuZero algorithm matched, without any knowledge of the game dynamics, the superhuman performance of the AlphaZero algorithm5 that was supplied with the rules of the game.

中文翻译:

通过学习模型进行规划,掌握 Atari、围棋、国际象棋和将棋

构建具有规划能力的代理长期以来一直是追求人工智能的主要挑战之一。基于树的规划方法在具有挑战性的领域取得了巨大的成功,例如 chess1 和 Go2,在这些领域中可以使用完美的模拟器。然而,在现实世界的问题中,支配环境的动力学通常是复杂的和未知的。在这里,我们介绍了 MuZero 算法,该算法通过将基于树的搜索与学习模型相结合,在一系列具有挑战性和视觉复杂的领域中实现了超人的表现,而无需了解其潜在动态。MuZero 算法学习一个可迭代模型,该模型产生与规划相关的预测:动作选择策略、价值函数和奖励。当对 57 种不同的 Atari 游戏3(用于测试人工智能技术的规范视频游戏环境,其中基于模型的规划方法在历史上一直在努力4)进行评估时,MuZero 算法实现了最先进的性能。在用于高性能规划的围棋、国际象棋和将棋规范环境中进行评估时,MuZero 算法在不了解游戏动态的情况下与游戏规则提供的 AlphaZero 算法的超人性能相匹配。
更新日期:2020-12-23
down
wechat
bug