当前位置: X-MOL 学术New Gener. Comput. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Machine Discovery of Comprehensible Strategies for Simple Games Using Meta-interpretive Learning
New Generation Computing ( IF 2.0 ) Pub Date : 2019-04-01 , DOI: 10.1007/s00354-019-00054-2
Stephen H. Muggleton , Celine Hocquette

Recently, world-class human players have been outperformed in a number of complex two-person games (Go, Chess, Checkers) by Deep Reinforcement Learning systems. However, the data efficiency of the learning systems is unclear given that they appear to require far more training games to achieve such performance than any human player might experience in a lifetime. In addition, the resulting learned strategies are not in a form which can be communicated to human players. This contrasts to earlier research in Behavioural Cloning in which single-agent skills were machine learned in a symbolic language, facilitating their being taught to human beings. In this paper, we consider Machine Discovery of human-comprehensible strategies for simple two-person games (Noughts-and-Crosses and Hexapawn). One advantage of considering simple games is that there is a tractable approach to calculating minimax regret. We use these games to compare Cumulative Minimax Regret for variants of both standard and deep reinforcement learning against two variants of a new Meta-interpretive Learning system called MIGO. In our experiments, tested variants of both normal and deep reinforcement learning have consistently worse performance (higher cumulative minimax regret) than both variants of MIGO on Noughts-and-Crosses and Hexapawn. In addition, MIGO’s learned rules are relatively easy to comprehend, and are demonstrated to achieve significant transfer learning in both directions between Noughts-and-Crosses and Hexapawn.

中文翻译:

使用元解释学习的简单游戏可理解策略的机器发现

最近,深度强化学习系统在许多复杂的两人游戏(围棋、国际象棋、跳棋)中表现优于世界一流的人类玩家。然而,学习系统的数据效率尚不清楚,因为它们似乎需要更多的训练游戏来实现这种性能,这比任何人类玩家一生中可能经历的都要多。此外,由此产生的学习策略不是可以传达给人类玩家的形式。这与早期的行为克隆研究形成对比,在早期研究中,单智能体技能是用符号语言机器学习的,有助于将它们传授给人类。在本文中,我们考虑了简单两人游戏(Noughts-and-Crosses 和 Hexapawn)的人类可理解策略的机器发现。考虑简单游戏的一个优点是有一种易于处理的方法来计算极小极大后悔。我们使用这些游戏将标准和深度强化学习变体的 Cumulative Minimax Regret 与称为 MIGO 的新元解释学习系统的两个变体进行比较。在我们的实验中,在 Noughts-and-Crosses 和 Hexapawn 上,正常和深度强化学习的测试变体的性能始终比 MIGO 的两个变体差(更高的累积最小最大遗憾)。此外,MIGO 的学习规则相对容易理解,并且被证明可以在 Noughts-and-Crosses 和 Hexapawn 之间实现双向的显着迁移学习。我们使用这些游戏将标准和深度强化学习变体的 Cumulative Minimax Regret 与称为 MIGO 的新元解释学习系统的两个变体进行比较。在我们的实验中,在 Noughts-and-Crosses 和 Hexapawn 上,正常和深度强化学习的测试变体的性能始终比 MIGO 的两个变体差(更高的累积最小最大遗憾)。此外,MIGO 的学习规则相对容易理解,并且被证明可以在 Noughts-and-Crosses 和 Hexapawn 之间实现双向的显着迁移学习。我们使用这些游戏将标准和深度强化学习变体的 Cumulative Minimax Regret 与称为 MIGO 的新元解释学习系统的两个变体进行比较。在我们的实验中,在 Noughts-and-Crosses 和 Hexapawn 上,正常和深度强化学习的测试变体的性能始终比 MIGO 的两个变体差(更高的累积最小最大遗憾)。此外,MIGO 的学习规则相对容易理解,并且被证明可以在 Noughts-and-Crosses 和 Hexapawn 之间实现双向的显着迁移学习。在 Noughts-and-Crosses 和 Hexapawn 上,经过测试的正常和深度强化学习变体的性能始终比 MIGO 的两个变体差(更高的累积最小最大遗憾)。此外,MIGO 的学习规则相对容易理解,并且被证明可以在 Noughts-and-Crosses 和 Hexapawn 之间实现双向的显着迁移学习。在 Noughts-and-Crosses 和 Hexapawn 上,经过测试的正常和深度强化学习变体的性能始终比 MIGO 的两个变体差(更高的累积最小最大遗憾)。此外,MIGO 的学习规则相对容易理解,并且被证明可以在 Noughts-and-Crosses 和 Hexapawn 之间实现双向的显着迁移学习。
更新日期:2019-04-01
down
wechat
bug