Provably Efficient Policy Gradient Methods for Two-Player Zero-Sum Markov Games,arXiv - CS - Computer Science and Game Theory

当前位置： X-MOL 学术 › arXiv.cs.GT › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Provably Efficient Policy Gradient Methods for Two-Player Zero-Sum Markov Games
arXiv - CS - Computer Science and Game Theory Pub Date : 2021-02-17 , DOI: arxiv-2102.08903
Yulai Zhao, Yuandong Tian, Jason D. Lee, Simon S. Du

Policy gradient methods are widely used in solving two-player zero-sum games to achieve superhuman performance in practice. However, it remains elusive when they can provably find a near-optimal solution and how many samples and iterations are needed. The current paper studies natural extensions of Natural Policy Gradient algorithm for solving two-player zero-sum games where function approximation is used for generalization across states. We thoroughly characterize the algorithms' performance in terms of the number of samples, number of iterations, concentrability coefficients, and approximation error. To our knowledge, this is the first quantitative analysis of policy gradient methods with function approximation for two-player zero-sum Markov games.

中文翻译：

两层零和马尔可夫博弈的证明有效的策略梯度方法

策略梯度方法广泛用于解决两人零和游戏，以在实践中实现超人的表现。但是，当他们可以证明找到最佳解决方案以及需要多少样本和迭代时，仍然难以捉摸。当前的论文研究了自然策略梯度算法的自然扩展，用于解决两人零和博弈，其中函数逼近用于跨状态的泛化。我们根据样本数，迭代数，集中系数和近似误差来全面描述算法的性能。据我们所知，这是针对两人零和马尔可夫游戏的具有函数逼近的策略梯度方法的首次定量分析。

更新日期：2021-02-18

点击分享查看原文

点击收藏

阅读更多本刊最新论文