Lyapunov-Based Reinforcement Learning for Decentralized Multi-Agent Control,arXiv - CS - Systems and Control

当前位置： X-MOL 学术 › arXiv.cs.SY › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Lyapunov-Based Reinforcement Learning for Decentralized Multi-Agent Control
arXiv - CS - Systems and Control Pub Date : 2020-09-20 , DOI: arxiv-2009.09361
Qingrui Zhang, Hao Dong, Wei Pan

Decentralized multi-agent control has broad applications, ranging from multi-robot cooperation to distributed sensor networks. In decentralized multi-agent control, systems are complex with unknown or highly uncertain dynamics, where traditional model-based control methods can hardly be applied. Compared with model-based control in control theory, deep reinforcement learning (DRL) is promising to learn the controller/policy from data without the knowing system dynamics. However, to directly apply DRL to decentralized multi-agent control is challenging, as interactions among agents make the learning environment non-stationary. More importantly, the existing multi-agent reinforcement learning (MARL) algorithms cannot ensure the closed-loop stability of a multi-agent system from a control-theoretic perspective, so the learned control polices are highly possible to generate abnormal or dangerous behaviors in real applications. Hence, without stability guarantee, the application of the existing MARL algorithms to real multi-agent systems is of great concern, e.g., UAVs, robots, and power systems, etc. In this paper, we aim to propose a new MARL algorithm for decentralized multi-agent control with a stability guarantee. The new MARL algorithm, termed as a multi-agent soft-actor critic (MASAC), is proposed under the well-known framework of "centralized-training-with-decentralized-execution". The closed-loop stability is guaranteed by the introduction of a stability constraint during the policy improvement in our MASAC algorithm. The stability constraint is designed based on Lyapunov's method in control theory. To demonstrate the effectiveness, we present a multi-agent navigation example to show the efficiency of the proposed MASAC algorithm.

中文翻译：

基于李雅普诺夫的去中心化多智能体控制强化学习

分散式多智能体控制具有广泛的应用，从多机器人协作到分布式传感器网络。在分散式多智能体控制中，系统复杂，动态未知或高度不确定，传统的基于模型的控制方法难以应用。与控制理论中基于模型的控制相比，深度强化学习（DRL）有望在不知道系统动力学的情况下从数据中学习控制器/策略。然而，将 DRL 直接应用于分散的多智能体控制是具有挑战性的，因为智能体之间的交互使学习环境变得不稳定。更重要的是，现有的多智能体强化学习（MARL）算法无法从控制论的角度保证多智能体系统的闭环稳定性，所以学习到的控制策略在实际应用中极有可能产生异常或危险的行为。因此，在没有稳定性保证的情况下，现有的 MARL 算法在实际多智能体系统中的应用备受关注，例如无人机、机器人和电力系统等。在本文中，我们旨在提出一种新的分散式 MARL 算法。具有稳定性保证的多代理控制。新的 MARL 算法被称为多智能体软演员评论家 (MASAC)，是在著名的“集中训练与分散执行”框架下提出的。通过在我们的 MASAC 算法的策略改进过程中引入稳定性约束来保证闭环稳定性。稳定性约束是基于控制理论中的李雅普诺夫方法设计的。为了证明有效性，

更新日期：2020-09-22

点击分享查看原文

点击收藏

阅读更多本刊最新论文