当前位置: X-MOL 学术IEEE Trans. Games › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Heuristic Search Value Iteration for Zero-Sum Stochastic Games
IEEE Transactions on Games ( IF 2.3 ) Pub Date : 2020-06-26 , DOI: 10.1109/tg.2020.3005214
Olivier Buffet , Jilles Steeve Dibangoye , Abdallah Saffidine , Vincent Thomas

In sequential decision making, heuristic search algorithms allow exploiting both the initial situation and an admissible heuristic to efficiently search for an optimal solution, often for planning purposes. Such algorithms exist for problems with uncertain dynamics, partial observability, multiple criteria, or multiple collaborating agents. In this article, we look at two-player zero-sum stochastic games (zsSGs) with a discounted criterion, in a view to propose a solution tailored to the fully observable case, while solutions have been proposed for particular, though still more general, partially observable cases. This setting induces reasoning on both a lower and an upper bound of the value function, which leads us to proposing zsSG-HSVI, an algorithm based on heuristic search value iteration (HSVI), and which thus relies on generating trajectories. We demonstrate that, each player acting optimistically, and employing simple heuristic initializations, HSVI's convergence in finite time to an $\epsilon$ -optimal solution is preserved. An empirical study of the resulting approach is conducted on benchmark problems of various sizes.

中文翻译:

零和随机博弈的启发式搜索值迭代

在顺序决策中,启发式搜索算法允许利用初始情况和可接受的启发式来有效地搜索最佳解决方案,通常用于规划目的。这种算法存在于动态不确定性、部分可观察性、多个标准或多个协作代理的问题中。在本文中,我们研究了具有折扣标准的两人零和随机博弈 (zsSGs),以提出针对完全可观察情况的解决方案,虽然已经提出了针对特定情况的解决方案,但仍然更一般,部分可观察的情况。这个设置会引发对值函数的下界和上界的推理,这导致我们提出 zsSG-HSVI,一种基于启发式搜索值迭代 (HSVI) 的算法,因此依赖于生成轨迹。我们证明,每个参与者都表现得乐观,并采用简单的启发式初始化,HSVI 在有限时间内收敛到$\epsilon$ - 保留最优解。对不同规模的基准问题进行了对所得方法的实证研究。
更新日期:2020-06-26
down
wechat
bug