当前位置:
X-MOL 学术
›
arXiv.cs.AI
›
论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Lyapunov-based uncertainty-aware safe reinforcement learning
arXiv - CS - Artificial Intelligence Pub Date : 2021-07-29 , DOI: arxiv-2107.13944 Ashkan B. Jeddi, Nariman L. Dehghani, Abdollah Shafieezadeh
arXiv - CS - Artificial Intelligence Pub Date : 2021-07-29 , DOI: arxiv-2107.13944 Ashkan B. Jeddi, Nariman L. Dehghani, Abdollah Shafieezadeh
Reinforcement learning (RL) has shown a promising performance in learning
optimal policies for a variety of sequential decision-making tasks. However, in
many real-world RL problems, besides optimizing the main objectives, the agent
is expected to satisfy a certain level of safety (e.g., avoiding collisions in
autonomous driving). While RL problems are commonly formalized as Markov
decision processes (MDPs), safety constraints are incorporated via constrained
Markov decision processes (CMDPs). Although recent advances in safe RL have
enabled learning safe policies in CMDPs, these safety requirements should be
satisfied during both training and in the deployment process. Furthermore, it
is shown that in memory-based and partially observable environments, these
methods fail to maintain safety over unseen out-of-distribution observations.
To address these limitations, we propose a Lyapunov-based uncertainty-aware
safe RL model. The introduced model adopts a Lyapunov function that converts
trajectory-based constraints to a set of local linear constraints. Furthermore,
to ensure the safety of the agent in highly uncertain environments, an
uncertainty quantification method is developed that enables identifying
risk-averse actions through estimating the probability of constraint
violations. Moreover, a Transformers model is integrated to provide the agent
with memory to process long time horizons of information via the self-attention
mechanism. The proposed model is evaluated in grid-world navigation tasks where
safety is defined as avoiding static and dynamic obstacles in fully and
partially observable environments. The results of these experiments show a
significant improvement in the performance of the agent both in achieving
optimality and satisfying safety constraints.
中文翻译:
基于李雅普诺夫的不确定性感知安全强化学习
强化学习 (RL) 在学习各种顺序决策任务的最佳策略方面表现出良好的性能。然而,在许多现实世界的 RL 问题中,除了优化主要目标之外,还期望智能体满足一定的安全水平(例如,避免自动驾驶中的碰撞)。虽然 RL 问题通常被形式化为马尔可夫决策过程 (MDP),但安全约束是通过受约束的马尔可夫决策过程 (CMDP) 来合并的。尽管安全 RL 的最新进展已经能够在 CMDP 中学习安全策略,但在培训和部署过程中都应满足这些安全要求。此外,研究表明,在基于记忆和部分可观察的环境中,这些方法无法保持对看不见的分布外观察的安全性。为了解决这些限制,我们提出了一个基于 Lyapunov 的不确定性感知安全 RL 模型。引入的模型采用 Lyapunov 函数,将基于轨迹的约束转换为一组局部线性约束。此外,为了确保代理在高度不确定的环境中的安全,开发了一种不确定性量化方法,通过估计违反约束的概率来识别风险规避行为。此外,还集成了 Transformers 模型,为代理提供内存,以通过自注意力机制处理长时间范围的信息。所提出的模型在网格世界导航任务中进行了评估,其中安全性被定义为在完全和部分可观察的环境中避免静态和动态障碍。
更新日期:2021-07-30
中文翻译:
基于李雅普诺夫的不确定性感知安全强化学习
强化学习 (RL) 在学习各种顺序决策任务的最佳策略方面表现出良好的性能。然而,在许多现实世界的 RL 问题中,除了优化主要目标之外,还期望智能体满足一定的安全水平(例如,避免自动驾驶中的碰撞)。虽然 RL 问题通常被形式化为马尔可夫决策过程 (MDP),但安全约束是通过受约束的马尔可夫决策过程 (CMDP) 来合并的。尽管安全 RL 的最新进展已经能够在 CMDP 中学习安全策略,但在培训和部署过程中都应满足这些安全要求。此外,研究表明,在基于记忆和部分可观察的环境中,这些方法无法保持对看不见的分布外观察的安全性。为了解决这些限制,我们提出了一个基于 Lyapunov 的不确定性感知安全 RL 模型。引入的模型采用 Lyapunov 函数,将基于轨迹的约束转换为一组局部线性约束。此外,为了确保代理在高度不确定的环境中的安全,开发了一种不确定性量化方法,通过估计违反约束的概率来识别风险规避行为。此外,还集成了 Transformers 模型,为代理提供内存,以通过自注意力机制处理长时间范围的信息。所提出的模型在网格世界导航任务中进行了评估,其中安全性被定义为在完全和部分可观察的环境中避免静态和动态障碍。