Self-Adaptive Monte-Carlo Tree Search in General Game Playing,IEEE Transactions on Games

当前位置： X-MOL 学术 › IEEE Trans. Games › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Self-Adaptive Monte-Carlo Tree Search in General Game Playing
IEEE Transactions on Games ( IF 1.7 ) Pub Date : 2020-06-01 , DOI: 10.1109/tg.2018.2884768
Chiara F. Sironi , Jialin Liu , Mark H. M. Winands

Many enhancements for Monte Carlo tree search (MCTS) have been applied successfully in general game playing (GGP). MCTS and its enhancements are controlled by multiple parameters that require extensive and time-consuming offline optimization. Moreover, as the played games are unknown in advance, offline optimization cannot tune parameters specifically for single games. This paper proposes a self-adaptive MCTS strategy (SA-MCTS) that integrates within the search a method to automatically tune search-control parameters online per game. It presents five different allocation strategies that decide how to allocate available samples to evaluate parameter values. Experiments with $\boldsymbol {1}$ s play-clock on multiplayer games show that for all the allocation strategies the performance of SA-MCTS that tunes two parameters is at least equal to or better than the performance of MCTS tuned offline and not optimized per-game. The allocation strategy that performs the best is N-Tuple Bandit Evolutionary Algorithm (NTBEA). This strategy also achieves a good performance when tuning four parameters. SA-MCTS can be considered as a successful strategy for domains that require parameter tuning for every single problem, and it is also a valid alternative for domains where offline parameter tuning is costly or infeasible.

中文翻译：

一般博弈中的自适应蒙特卡罗树搜索

蒙特卡罗树搜索 (MCTS) 的许多增强功能已成功应用于一般游戏 (GGP)。MCTS 及其增强功能由需要大量且耗时的离线优化的多个参数控制。而且，由于玩过的游戏是事先未知的，离线优化无法专门针对单个游戏进行参数调优。本文提出了一种自适应 MCTS 策略（SA-MCTS），该策略在搜索中集成了一种方法，可以自动调整每个游戏的在线搜索控制参数。它提出了五种不同的分配策略，决定如何分配可用样本来评估参数值。在多人游戏上使用 $\boldsymbol {1}$ s play-clock 进行的实验表明，对于所有分配策略，调整两个参数的 SA-MCTS 的性能至少等于或优于离线调整且未优化的 MCTS 的性能每场比赛。性能最好的分配策略是 N-Tuple Bandit Evolutionary Algorithm (NTBEA)。该策略在调整四个参数时也取得了不错的性能。SA-MCTS 可以被认为是一种成功的策略，适用于需要对每个问题进行参数调整的领域，对于离线参数调整成本高昂或不可行的领域，它也是一种有效的替代方案。该策略在调整四个参数时也取得了不错的性能。SA-MCTS 可以被认为是一种成功的策略，适用于需要对每个问题进行参数调整的领域，对于离线参数调整成本高昂或不可行的领域，它也是一种有效的替代方案。该策略在调整四个参数时也取得了不错的性能。SA-MCTS 可以被认为是一种成功的策略，适用于需要对每个问题进行参数调整的领域，对于离线参数调整成本高昂或不可行的领域，它也是一种有效的替代方案。

更新日期：2020-06-01

点击分享查看原文

点击收藏

阅读更多本刊最新论文