Scalable Reinforcement Learning Policies for Multi-Agent Control,arXiv - CS - Multiagent Systems

当前位置： X-MOL 学术 › arXiv.cs.MA › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Scalable Reinforcement Learning Policies for Multi-Agent Control
arXiv - CS - Multiagent Systems Pub Date : 2020-11-16 , DOI: arxiv-2011.08055
Christopher D. Hsu, Heejin Jeong, George J. Pappas, and Pratik Chaudhari

This paper develops a stochastic Multi-Agent Reinforcement Learning (MARL) method to learn control policies that can handle an arbitrary number of external agents; our policies can be executed for tasks consisting of 1000 pursuers and 1000 evaders. We model pursuers as agents with limited on-board sensing and formulate the problem as a decentralized, partially-observable Markov Decision Process. An attention mechanism is used to build a permutation and input-size invariant embedding of the observations for learning a stochastic policy and value function using techniques in entropy-regularized off-policy methods. Simulation experiments on a large number of problems show that our control policies are dramatically scalable and display cooperative behavior in spite of being executed in a decentralized fashion; our methods offer a simple solution to classical multi-agent problems using techniques in reinforcement learning.

中文翻译：

多智能体控制的可扩展强化学习策略

本文开发了一种随机多智能体强化学习 (MARL) 方法来学习可以处理任意数量外部智能体的控制策略；我们的策略可以执行由 1000 个追求者和 1000 个逃避者组成的任务。我们将追踪者建模为具有有限车载感知的代理，并将问题表述为分散的、部分可观察的马尔可夫决策过程。注意力机制用于构建观察的置换和输入大小不变嵌入，以使用熵正则化离策略方法中的技术学习随机策略和值函数。大量问题的模拟实验表明，尽管以分散的方式执行，我们的控制策略具有显着的可扩展性并表现出合作行为；

更新日期：2020-11-17

点击分享查看原文

点击收藏

阅读更多本刊最新论文