Learning Nash Equilibria in Zero-Sum Stochastic Games via Entropy-Regularized Policy Approximation,arXiv - CS - Computer Science and Game Theory

当前位置： X-MOL 学术 › arXiv.cs.GT › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Learning Nash Equilibria in Zero-Sum Stochastic Games via Entropy-Regularized Policy Approximation
arXiv - CS - Computer Science and Game Theory Pub Date : 2020-09-01 , DOI: arxiv-2009.00162
Qifan Zhang, Yue Guan, Panagiotis Tsiotras

We explore the use of policy approximation for reducing the computational cost of learning Nash equilibria in multi-agent reinforcement learning scenarios. We propose a new algorithm for zero-sum stochastic games in which each agent simultaneously learns a Nash policy and an entropy-regularized policy. The two policies help each other towards convergence: the former guides the latter to the desired Nash equilibrium, while the latter serves as an efficient approximation of the former. We demonstrate the possibility of using the proposed algorithm to transfer previous training experiences to different environments, enabling the agents to adapt quickly to new tasks. We also provide a dynamic hyper-parameter scheduling scheme for further expedited convergence. Empirical results applied to a number of stochastic games show that the proposed algorithm converges to the Nash equilibrium while exhibiting a major speed-up over existing algorithms.

中文翻译：

通过熵正则化策略逼近在零和随机博弈中学习纳什均衡

我们探索使用策略近似来降低多智能体强化学习场景中学习纳什均衡的计算成本。我们为零和随机游戏提出了一种新算法，其中每个代理同时学习纳什策略和熵正则化策略。这两种策略在趋同方面相互帮助：前者引导后者达到所需的纳什均衡，而后者则作为前者的有效近似。我们证明了使用所提出的算法将先前的训练经验转移到不同环境的可能性，使代理能够快速适应新任务。我们还提供了动态超参数调度方案，以进一步加快收敛速度。

更新日期：2020-09-02

点击分享查看原文

点击收藏

阅读更多本刊最新论文