当前位置: X-MOL 学术IEEE Trans. Autom. Control › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Finite-Sample Analysis for Decentralized Batch Multiagent Reinforcement Learning With Networked Agents
IEEE Transactions on Automatic Control ( IF 6.8 ) Pub Date : 2021-01-05 , DOI: 10.1109/tac.2021.3049345
Kaiqing Zhang , Zhuoran Yang , Han Liu , Tong Zhang , Tamer Basar

Despite the increasing interest in multiagent reinforcement learning (MARL) in multiple communities, understanding its theoretical foundation has long been recognized as a challenging problem. In this article, we address this problem by providing a finite-sample analysis for decentralized batch MARL. Specifically, we consider a type of mixed MARL setting with both cooperative and competitive agents, where two teams of agents compete in a zero-sum game setting, while the agents within each team collaborate by communicating over a time-varying network. This setting covers many conventional MARL settings in the literature. We then develop batch MARL algorithms that can be implemented in a decentralized fashion, and quantify the finite-sample errors of the estimated action-value functions. Our error analysis captures how the function class, the number of samples within each iteration, and the number of iterations determine the statistical accuracy of the proposed algorithms. Our results, compared to the finite-sample bounds for single-agent reinforcement learning, involve additional error terms caused by decentralized computation, which is inherent in our decentralized MARL setting. This article provides the first finite-sample analysis for batch MARL, a step toward rigorous theoretical understanding of general MARL algorithms in the finite-sample regime.

中文翻译:

具有网络代理的分散式批量多代理强化学习的有限样本分析

尽管多个社区对多智能体强化学习 (MARL) 越来越感兴趣,但长期以来,理解其理论基础一直被认为是一个具有挑战性的问题。在本文中,我们通过为分散批量 MARL 提供有限样本分析来解决这个问题。具体来说,我们考虑了一种具有合作和竞争代理的混合 MARL 设置,其中两个代理团队在零和游戏设置中竞争,而每个团队中的代理通过时变网络进行通信进行协作。此设置涵盖了文献中的许多常规 MARL 设置。然后,我们开发了可以以分散方式实施的批量 MARL 算法,并量化估计动作值函数的有限样本误差。我们的错误分析捕获了函数类如何,每次迭代中的样本数,迭代次数决定了所提出算法的统计精度。与单智能体强化学习的有限样本边界相比,我们的结果涉及由分散计算引起的额外误差项,这是我们分散的 MARL 设置中固有的。本文提供了批量 MARL 的第一个有限样本分析,这是对有限样本机制中通用 MARL 算法的严格理论理解迈出的一步。这是我们去中心化 MARL 设置所固有的。本文提供了批量 MARL 的第一个有限样本分析,这是对有限样本机制中通用 MARL 算法的严格理论理解迈出的一步。这是我们去中心化 MARL 设置所固有的。本文提供了批量 MARL 的第一个有限样本分析,这是对有限样本机制中通用 MARL 算法的严格理论理解迈出的一步。
更新日期:2021-01-05
down
wechat
bug