On Bellman's Optimality Principle for zs-POSGs,arXiv - CS - Computer Science and Game Theory

当前位置： X-MOL 学术 › arXiv.cs.GT › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

On Bellman's Optimality Principle for zs-POSGs
arXiv - CS - Computer Science and Game Theory Pub Date : 2020-06-29 , DOI: arxiv-2006.16395
Olivier Buffet, Jilles Dibangoye, Aur\'elien Delage, Abdallah Saffidine, Vincent Thomas

Many non-trivial sequential decision-making problems are efficiently solved by relying on Bellman's optimality principle, i.e., exploiting the fact that sub-problems are nested recursively within the original problem. Here we show how it can apply to (infinite horizon) 2-player zero-sum partially observable stochastic games (zs-POSGs) by (i) taking a central planner's viewpoint, which can only reason on a sufficient statistic called occupancy state, and (ii) turning such problems into zero-sum occupancy Markov games (zs-OMGs). Then, exploiting the Lipschitz-continuity of the value function in occupancy space, one can derive a version of the HSVI algorithm (Heuristic Search Value Iteration) that provably finds an $\epsilon$-Nash equilibrium in finite time.

中文翻译：

关于 zs-POSG 的 Bellman 最优性原理

许多非平凡的顺序决策问题通过依赖于贝尔曼的最优性原则，即利用子问题递归嵌套在原始问题中的事实而得到有效解决。在这里，我们展示了它如何适用于（无限视界）2 人零和部分可观察随机博弈 (zs-POSGs)，方法是 (i) 采取中央规划者的观点，它只能根据称为占用状态的充分统计数据进行推理，以及(ii) 将这些问题转化为零和占用马尔可夫博弈 (zs-OMGs)。然后，利用占用空间中价值函数的 Lipschitz 连续性，可以推导出 HSVI 算法（启发式搜索值迭代）的一个版本，该算法可证明在有限时间内找到 $\epsilon$-Nash 均衡。

更新日期：2020-07-01

点击分享查看原文

点击收藏

阅读更多本刊最新论文