当前位置: X-MOL 学术arXiv.cs.AI › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Lyapunov-based uncertainty-aware safe reinforcement learning
arXiv - CS - Artificial Intelligence Pub Date : 2021-07-29 , DOI: arxiv-2107.13944
Ashkan B. Jeddi, Nariman L. Dehghani, Abdollah Shafieezadeh

Reinforcement learning (RL) has shown a promising performance in learning optimal policies for a variety of sequential decision-making tasks. However, in many real-world RL problems, besides optimizing the main objectives, the agent is expected to satisfy a certain level of safety (e.g., avoiding collisions in autonomous driving). While RL problems are commonly formalized as Markov decision processes (MDPs), safety constraints are incorporated via constrained Markov decision processes (CMDPs). Although recent advances in safe RL have enabled learning safe policies in CMDPs, these safety requirements should be satisfied during both training and in the deployment process. Furthermore, it is shown that in memory-based and partially observable environments, these methods fail to maintain safety over unseen out-of-distribution observations. To address these limitations, we propose a Lyapunov-based uncertainty-aware safe RL model. The introduced model adopts a Lyapunov function that converts trajectory-based constraints to a set of local linear constraints. Furthermore, to ensure the safety of the agent in highly uncertain environments, an uncertainty quantification method is developed that enables identifying risk-averse actions through estimating the probability of constraint violations. Moreover, a Transformers model is integrated to provide the agent with memory to process long time horizons of information via the self-attention mechanism. The proposed model is evaluated in grid-world navigation tasks where safety is defined as avoiding static and dynamic obstacles in fully and partially observable environments. The results of these experiments show a significant improvement in the performance of the agent both in achieving optimality and satisfying safety constraints.

中文翻译:

基于李雅普诺夫的不确定性感知安全强化学习

强化学习 (RL) 在学习各种顺序决策任务的最佳策略方面表现出良好的性能。然而,在许多现实世界的 RL 问题中,除了优化主要目标之外,还期望智能体满足一定的安全水平(例如,避免自动驾驶中的碰撞)。虽然 RL 问题通常被形式化为马尔可夫决策过程 (MDP),但安全约束是通过受约束的马尔可夫决策过程 (CMDP) 来合并的。尽管安全 RL 的最新进展已经能够在 CMDP 中学习安全策略,但在培训和部署过程中都应满足这些安全要求。此外,研究表明,在基于记忆和部分可观察的环境中,这些方法无法保持对看不见的分布外观察的安全性。为了解决这些限制,我们提出了一个基于 Lyapunov 的不确定性感知安全 RL 模型。引入的模型采用 Lyapunov 函数,将基于轨迹的约束转换为一组局部线性约束。此外,为了确保代理在高度不确定的环境中的安全,开发了一种不确定性量化方法,通过估计违反约束的概率来识别风险规避行为。此外,还集成了 Transformers 模型,为代理提供内存,以通过自注意力机制处理长时间范围的信息。所提出的模型在网格世界导航任务中进行了评估,其中安全性被定义为在完全和部分可观察的环境中避免静态和动态障碍。
更新日期:2021-07-30
down
wechat
bug