当前位置: X-MOL 学术arXiv.cs.GT › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
On Bellman's Optimality Principle for zs-POSGs
arXiv - CS - Computer Science and Game Theory Pub Date : 2020-06-29 , DOI: arxiv-2006.16395
Olivier Buffet, Jilles Dibangoye, Aur\'elien Delage, Abdallah Saffidine, Vincent Thomas

Many non-trivial sequential decision-making problems are efficiently solved by relying on Bellman's optimality principle, i.e., exploiting the fact that sub-problems are nested recursively within the original problem. Here we show how it can apply to (infinite horizon) 2-player zero-sum partially observable stochastic games (zs-POSGs) by (i) taking a central planner's viewpoint, which can only reason on a sufficient statistic called occupancy state, and (ii) turning such problems into zero-sum occupancy Markov games (zs-OMGs). Then, exploiting the Lipschitz-continuity of the value function in occupancy space, one can derive a version of the HSVI algorithm (Heuristic Search Value Iteration) that provably finds an $\epsilon$-Nash equilibrium in finite time.

中文翻译:

关于 zs-POSG 的 Bellman 最优性原理

许多非平凡的顺序决策问题通过依赖于贝尔曼的最优性原则,即利用子问题递归嵌套在原始问题中的事实而得到有效解决。在这里,我们展示了它如何适用于(无限视界)2 人零和部分可观察随机博弈 (zs-POSGs),方法是 (i) 采取中央规划者的观点,它只能根据称为占用状态的充分统计数据进行推理,以及(ii) 将这些问题转化为零和占用马尔可夫博弈 (zs-OMGs)。然后,利用占用空间中价值函数的 Lipschitz 连续性,可以推导出 HSVI 算法(启发式搜索值迭代)的一个版本,该算法可证明在有限时间内找到 $\epsilon$-Nash 均衡。
更新日期:2020-07-01
down
wechat
bug