当前位置: X-MOL 学术Artif. Intell. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Sim-to-Lab-to-Real: Safe reinforcement learning with shielding and generalization guarantees
Artificial Intelligence ( IF 5.1 ) Pub Date : 2022-10-26 , DOI: 10.1016/j.artint.2022.103811
Kai-Chieh Hsu , Allen Z. Ren , Duy P. Nguyen , Anirudha Majumdar , Jaime F. Fisac

Safety is a critical component of autonomous systems and remains a challenge for learning-based policies to be utilized in the real world. In particular, policies learned using reinforcement learning often fail to generalize to novel environments due to unsafe behavior. In this paper, we propose Sim-to-Lab-to-Real to bridge the reality gap with a probabilistically guaranteed safety-aware policy distribution. To improve safety, we apply a dual policy setup where a performance policy is trained using the cumulative task reward and a backup (safety) policy is trained by solving the Safety Bellman Equation based on Hamilton-Jacobi (HJ) reachability analysis. In Sim-to-Lab transfer, we apply a supervisory control scheme to shield unsafe actions during exploration; in Lab-to-Real transfer, we leverage the Probably Approximately Correct (PAC)-Bayes framework to provide lower bounds on the expected performance and safety of policies in unseen environments. Additionally, inheriting from the HJ reachability analysis, the bound accounts for the expectation over the worst-case safety in each environment. We empirically study the proposed framework for ego-vision navigation in two types of indoor environments with varying degrees of photorealism. We also demonstrate strong generalization performance through hardware experiments in real indoor spaces with a quadrupedal robot. See https://sites.google.com/princeton.edu/sim-to-lab-to-real for supplementary material.



中文翻译:

Sim-to-Lab-to-Real:具有屏蔽和泛化保证的安全强化学习

安全性是自主系统的关键组成部分,并且仍然是在现实世界中使用基于学习的策略的挑战。特别是,由于不安全的行为,使用强化学习学习的策略通常无法推广到新环境。在本文中,我们提出 Sim-to-Lab-to-Real 以通过概率保证的安全意识策略分布来弥合现实差距。为了提高安全性,我们应用了双重策略设置,其中使用累积任务奖励训练性能策略,并通过解决基于 Hamilton-Jacobi (HJ) 可达性分析的安全贝尔曼方程来训练备份(安全)策略。在Sim-to-Lab转移中,我们应用监督控制方案来屏蔽探索过程中的不安全行为;在实验室到真实转移,我们利用可能近似正确 (PAC)-Bayes 框架来为看不见的环境中策略的预期性能和安全性提供下限。此外,继承自 HJ 可达性分析,该界限说明了对每个环境中最坏情况安全性的预期。我们经验性地研究了在具有不同程度的照片写实的两种类型的室内环境中提出的自我视觉导航框架。我们还通过四足机器人在真实室内空间中的硬件实验展示了强大的泛化性能。有关补充材料,请参阅 https://sites.google.com/princeton.edu/sim-to-lab-to-real。

更新日期:2022-10-26
down
wechat
bug