Improving RTS Game AI by Supervised Policy Learning, Tactical Search, and Deep Reinforcement Learning,IEEE Computational Intelligence Magazine

当前位置： X-MOL 学术 › IEEE Comput. Intell. Mag. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Improving RTS Game AI by Supervised Policy Learning, Tactical Search, and Deep Reinforcement Learning
IEEE Computational Intelligence Magazine ( IF 10.3 ) Pub Date : 2019-08-01 , DOI: 10.1109/mci.2019.2919363
Nicolas A. Barriga , Marius Stanescu , Felipe Besoain , Michael Buro

Constructing strong AI systems for video games is difficult due to enormous state and action spaces and the lack of good state evaluation functions and high-level action abstractions. In spite of recent research progress in popular video game genres such as Atari 2600 console games and multiplayer online battle arena (MOBA) games, to this day strong human players can still defeat the best AI systems in adversarial video games. In this paper, we propose to use a deep Convolutional Neural Network (CNN) to select among a limited set of abstract action choices in Real-Time Strategy (RTS) games, and to utilize the remaining computation time for game tree search to improve low-level tactics. The CNN is trained by supervised learning on game states labeled by Puppet Search, a strategic search algorithm that uses action abstractions. Replacing Puppet Search by a CNN frees up time that can be used for improving units? tactical behavior while executing the strategic plan. Experiments in the μRTS game show that the combined algorithm results in higher win-rates than either of its two independent components and other state-of-the-art μRTS agents. We then present a case-study that investigates how deep Reinforcement Learning (RL) can be used in modern video games, such as Total War: Warhammer, to improve tactical multi-agent AI modules. We use popular RL algorithms such as Deep-Q Networks (DQN) and Asynchronous Advantage-Actor Critic (A3C), basic network architectures and minimal hyper-parameter tuning to learn complex cooperative behaviors that defeat the highest difficulty built-in AI in mediumscale scenarios.

中文翻译：

通过监督策略学习、战术搜索和深度强化学习改进 RTS 游戏 AI

由于巨大的状态和动作空间以及缺乏良好的状态评估函数和高级动作抽象，为视频游戏构建强大的人工智能系统很困难。尽管最近在流行视频游戏类型（例如 Atari 2600 控制台游戏和多人在线战斗竞技场 (MOBA) 游戏）方面的研究取得了进展，但直到今天，强大的人类玩家仍然可以击败对抗性视频游戏中最好的 AI 系统。在本文中，我们建议使用深度卷积神经网络 (CNN) 在实时策略 (RTS) 游戏中的一组有限抽象动作选择中进行选择，并利用剩余的计算时间进行游戏树搜索以提高低级战术。CNN 通过对 Puppet Search 标记的游戏状态的监督学习进行训练，Puppet Search 是一种使用动作抽象的战略搜索算法。用 CNN 代替 Puppet Search 可以腾出时间来改进单位吗？执行战略计划时的战术行为。μRTS 游戏中的实验表明，组合算法的获胜率高于其两个独立组件和其他最先进的 μRTS 代理中的任何一个。然后，我们提供了一个案例研究，调查如何在现代视频游戏（例如全面战争：战锤）中使用深度强化学习 (RL) 来改进战术多智能体 AI 模块。我们使用流行的 RL 算法，例如 Deep-Q Networks (DQN) 和 Asynchronous Advantage-Actor Critic (A3C)、基本网络架构和最小的超参数调整来学习复杂的合作行为，这些行为在中等规模的场景中击败了最高难度的内置 AI .

更新日期：2019-08-01

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南11