当前位置: X-MOL 学术arXiv.cs.FL › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Safe-Critical Modular Deep Reinforcement Learning with Temporal Logic through Gaussian Processes and Control Barrier Functions
arXiv - CS - Formal Languages and Automata Theory Pub Date : 2021-09-07 , DOI: arxiv-2109.02791
Mingyu Cai, Cristian-Ioan Vasile

Reinforcement learning (RL) is a promising approach and has limited success towards real-world applications, because ensuring safe exploration or facilitating adequate exploitation is a challenges for controlling robotic systems with unknown models and measurement uncertainties. Such a learning problem becomes even more intractable for complex tasks over continuous space (state-space and action-space). In this paper, we propose a learning-based control framework consisting of several aspects: (1) linear temporal logic (LTL) is leveraged to facilitate complex tasks over an infinite horizons which can be translated to a novel automaton structure; (2) we propose an innovative reward scheme for RL-agent with the formal guarantee such that global optimal policies maximize the probability of satisfying the LTL specifications; (3) based on a reward shaping technique, we develop a modular policy-gradient architecture utilizing the benefits of automaton structures to decompose overall tasks and facilitate the performance of learned controllers; (4) by incorporating Gaussian Processes (GPs) to estimate the uncertain dynamic systems, we synthesize a model-based safeguard using Exponential Control Barrier Functions (ECBFs) to address problems with high-order relative degrees. In addition, we utilize the properties of LTL automatons and ECBFs to construct a guiding process to further improve the efficiency of exploration. Finally, we demonstrate the effectiveness of the framework via several robotic environments. And we show such an ECBF-based modular deep RL algorithm achieves near-perfect success rates and guard safety with a high probability confidence during training.

中文翻译:

通过高斯过程和控制障碍函数的具有时间逻辑的安全临界模块化深度强化学习

强化学习 (RL) 是一种很有前途的方法,但在实际应用中取得的成功有限,因为确保安全探索或促进充分利用是控制具有未知模型和测量不确定性的机器人系统的挑战。对于连续空间(状态空间和动作空间)上的复杂任务,这样的学习问题变得更加棘手。在本文中,我们提出了一个基于学习的控制框架,包括以下几个方面:(1)利用线性时间逻辑(LTL)来促进无限范围内的复杂任务,这些任务可以转化为新的自动机结构;(2) 我们为 RL-agent 提出了一种创新的奖励方案,它具有形式保证,使得全局最优策略最大化满足 LTL 规范的概率;(3) 基于奖励塑造技术,我们开发了一个模块化的策略梯度架构,利用自动机结构的好处来分解整体任务并促进学习控制器的性能;(4) 通过结合高斯过程 (GPs) 来估计不确定的动态系统,我们使用指数控制障碍函数 (ECBFs) 合成基于模型的安全措施来解决高阶相对度问题。此外,我们利用 LTL 自动机和 ECBF 的特性来构建引导过程,以进一步提高探索效率。最后,我们通过几个机器人环境证明了框架的有效性。我们展示了这种基于 ECBF 的模块化深度强化学习算法在训练期间以高概率置信度实现了近乎完美的成功率和保护安全。
更新日期:2021-09-08
down
wechat
bug