Safe Multi-Agent Reinforcement Learning via Shielding,arXiv - CS - Formal Languages and Automata Theory

当前位置： X-MOL 学术 › arXiv.cs.FL › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Safe Multi-Agent Reinforcement Learning via Shielding
arXiv - CS - Formal Languages and Automata Theory Pub Date : 2021-01-27 , DOI: arxiv-2101.11196
Ingy Elsayed-Aly, Suda Bharadwaj, Christopher Amato, Rüdiger Ehlers, Ufuk Topcu, Lu Feng

Multi-agent reinforcement learning (MARL) has been increasingly used in a wide range of safety-critical applications, which require guaranteed safety (e.g., no unsafe states are ever visited) during the learning process.Unfortunately, current MARL methods do not have safety guarantees. Therefore, we present two shielding approaches for safe MARL. In centralized shielding, we synthesize a single shield to monitor all agents' joint actions and correct any unsafe action if necessary. In factored shielding, we synthesize multiple shields based on a factorization of the joint state space observed by all agents; the set of shields monitors agents concurrently and each shield is only responsible for a subset of agents at each step.Experimental results show that both approaches can guarantee the safety of agents during learning without compromising the quality of learned policies; moreover, factored shielding is more scalable in the number of agents than centralized shielding.

中文翻译：

通过屏蔽进行安全的多智能体强化学习

多主体强化学习（MARL）已被越来越多地用于要求严格安全的应用程序中，这些应用程序在学习过程中需要保证安全性（例如，从未访问过任何不安全状态）。不幸的是，当前的MARL方法不具有安全性。保证。因此，我们提出了两种用于安全MARL的屏蔽方法。在集中式防护中，我们合成了一个防护罩来监视所有特工的联合行动，并在必要时纠正任何不安全的行动。在分解式屏蔽中，我们基于所有代理观察到的联合状态空间的分解来合成多个屏蔽。一组屏蔽同时监控代理，每个屏蔽仅在每个步骤中负责代理的一个子集。实验结果表明，两种方法都可以保证代理在学习过程中的安全，而不会影响学习策略的质量。此外，分解式屏蔽在代理程序数量上比集中式屏蔽更具扩展性。

更新日期：2021-01-28

点击分享查看原文

点击收藏

阅读更多本刊最新论文