当前位置: X-MOL 学术IEEE Trans. Games › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Learning the Game of Go by Scalable Network without Prior Knowledge of Komi
IEEE Transactions on Games ( IF 2.3 ) Pub Date : 2020-06-01 , DOI: 10.1109/tg.2020.2992858
Bohong Yang , Lin Wang , Hong Lu , Youzhao Yang

AlphaGo trains a value network to predict the win rate of the current state with 7.5 komi on a 19 × 19 board. The komi of most rectangular boards is unknown, so we do not know who the winner is at the end of the game. We need to use the human experience to guess a komi and then train the value network with this komi. Therefore, the accuracy of the value network is related to the accuracy of the guess. This article uses the board value network to calculate the score of the current state and tries to maximize the score. Then, we do not need to guess the komi. We also modify the network structure to support the board with arbitrary board size as input. Furthermore, we can transfer knowledge of the small board to the large board. We propose an algorithm that can adapt to the bonus rule. We have experimentally proved that our method is effective on a small board and has the ability to transfer knowledge to the large board. In order to better understand the learning process, we visualize the policy and score of some major branches. Finally, we show the solution that our program obtained on 6 × 6, 6 × 7, and 7 × 8 boards.

中文翻译:

在没有 Komi 的先验知识的情况下通过可扩展网络学习围棋游戏

AlphaGo 训练一个价值网络,在 19 × 19 的棋盘上用 7.5 komi 预测当前状态的获胜率。大多数矩形板的komi是未知的,所以我们不知道游戏结束时谁是赢家。我们需要使用人类的经验来猜测一个 komi,然后用这个 komi 训练价值网络。因此,价值网络的准确性与猜测的准确性有关。本文使用板值网络计算当前状态的得分,并尝试最大化得分。然后,我们不需要猜测 komi。我们还修改了网络结构以支持以任意板尺寸作为输入的板。此外,我们可以将小板的知识转移到大板。我们提出了一种可以适应奖金规则的算法。我们已经通过实验证明我们的方法在小板上是有效的,并且具有将知识转移到大板上的能力。为了更好地理解学习过程,我们将一些主要分支的策略和分数可视化。最后,我们展示了我们的程序在 6 × 6、6 × 7 和 7 × 8 板上获得的解决方案。
更新日期:2020-06-01
down
wechat
bug