Reinforcement Learning in Linear Quadratic Deep Structured Teams: Global Convergence of Policy Gradient Methods,arXiv - CS - Multiagent Systems

当前位置： X-MOL 学术 › arXiv.cs.MA › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Reinforcement Learning in Linear Quadratic Deep Structured Teams: Global Convergence of Policy Gradient Methods
arXiv - CS - Multiagent Systems Pub Date : 2020-11-29 , DOI: arxiv-2011.14393
Vida Fathi, Jalal Arabneydi, Amir G. Aghdam

In this paper, we study the global convergence of model-based and model-free policy gradient descent and natural policy gradient descent algorithms for linear quadratic deep structured teams. In such systems, agents are partitioned into a few sub-populations wherein the agents in each sub-population are coupled in the dynamics and cost function through a set of linear regressions of the states and actions of all agents. Every agent observes its local state and the linear regressions of states, called deep states. For a sufficiently small risk factor and/or sufficiently large population, we prove that model-based policy gradient methods globally converge to the optimal solution. Given an arbitrary number of agents, we develop model-free policy gradient and natural policy gradient algorithms for the special case of risk-neutral cost function. The proposed algorithms are scalable with respect to the number of agents due to the fact that the dimension of their policy space is independent of the number of agents in each sub-population. Simulations are provided to verify the theoretical results.

中文翻译：

线性二次深度结构化团队中的强化学习：政策梯度方法的全局收敛

在本文中，我们研究了线性二次深度结构化团队的基于模型和无模型的策略梯度下降算法和自然策略梯度下降算法的全局收敛性。在这样的系统中，将代理商划分为几个子群体，其中每个子群体中的代理商通过所有代理商的状态和行为的线性回归集合在动力学和成本函数中耦合。每个代理都观察其局部状态和状态的线性回归，这称为深度状态。对于足够小的风险因素和/或足够大的人口，我们证明了基于模型的政策梯度方法在全球范围内收敛于最优解。给定任意数量的代理，我们针对风险中性成本函数的特殊情况开发了无模型策略梯度和自然策略梯度算法。所提出的算法相对于代理数量是可伸缩的，这是由于其策略空间的大小与每个子人口中代理数量无关。提供仿真以验证理论结果。

更新日期：2020-12-01

点击分享查看原文

点击收藏

阅读更多本刊最新论文