当前位置: X-MOL 学术IEEE Trans. Games › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Crawling in Rogue's dungeons with deep reinforcement techniques
IEEE Transactions on Games ( IF 2.3 ) Pub Date : 2020-06-01 , DOI: 10.1109/tg.2019.2899159
Andrea Asperti , Daniele Cortesi , Carlo De Pieri , Gianmaria Pedrini , Francesco Sovrano

This paper is a report of our extensive experimentation, during the last two years, of deep reinforcement techniques for training an agent to move in the dungeons of the famous Rogue video game. The challenging nature of the problem is tightly related to the procedural, random generation of new dungeon maps at each level, which forbids any form of level-specific learning and forces us to address the navigation problem in its full generality. Other interesting aspects of the game from the point of view of automatic learning are the partially observable nature of the problem since maps are initially not visible and get discovered during exploration, and the problem of sparse rewards, requiring the acquisition of complex, nonreactive behaviors involving memory and planning. In this paper, we develop on previous works to make a more systematic comparison of different learning techniques, focusing in particular on Asynchronous Advantage Actor–Critic and Actor–Critic with Experience Replay (ACER). In a game like Rogue, sparsity of rewards is mitigated by the variability of the dungeon configurations (sometimes, by luck, exit is at hand); if this variability can be tamed—as ACER, better than other algorithms, seems able to do—the problem of sparse rewards can be overcome without any need of intrinsic motivations.

中文翻译:

使用深度强化技术在 Rogue 的地牢中爬行

这篇论文是关于我们在过去两年中进行的广泛实验的报告,这些实验是用于训练智能体在著名的 Rogue 视频游戏的地牢中移动的深度强化技术。该问题的挑战性本质与每个级别新地牢地图的程序性随机生成密切相关,这禁止任何形式的特定级别学习,并迫使我们全面解决导航问题。从自动学习的角度来看,游戏的其他有趣方面是问题的部分可观察性质,因为地图最初是不可见的,并在探索过程中被发现,以及稀疏奖励的问题,需要获得复杂的、非反应性的行为,包括记忆和计划。在本文中,我们在以前的工作的基础上发展,对不同的学习技术进行更系统的比较,特别关注异步优势 Actor-Critic 和 Actor-Critic with Experience Replay (ACER)。在像 Rogue 这样的游戏中,地牢配置的可变性减轻了奖励的稀疏性(有时,幸运的是,退出就在眼前);如果这种可变性可以被驯服——就像 ACER 似乎能够做到的那样,比其他算法更好——那么稀疏奖励的问题就可以被克服,而无需任何内在动机。
更新日期:2020-06-01
down
wechat
bug