当前位置: X-MOL 学术IEEE Robot. Automation Lett. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Recovery RL: Safe Reinforcement Learning With Learned Recovery Zones
IEEE Robotics and Automation Letters ( IF 4.6 ) Pub Date : 2021-03-31 , DOI: 10.1109/lra.2021.3070252
Ashwin Balakrishna 1 , Brijen Thananjeyan 2 , Suraj Nair 3 , Michael Luo 4 , Krishnan Srinivasan 5 , Minho Hwang 6 , Joseph E. Gonzalez 7 , Julian Ibarz 8 , Chelsea Finn 9 , Ken Goldberg 10
Affiliation  

Safety remains a central obstacle preventing widespread use of RL in the real world: learning new tasks in uncertain environments requires extensive exploration, but safety requires limiting exploration. We propose Recovery RL, an algorithm which navigates this tradeoff by (1) leveraging offline data to learn about constraint violating zones before policy learning and (2) separating the goals of improving task performance and constraint satisfaction across two policies: a task policy that only optimizes the task reward and a recovery policy that guides the agent to safety when constraint violation is likely. We evaluate Recovery RL on 6 simulation domains, including two contact-rich manipulation tasks and an image-based navigation task, and an image-based obstacle avoidance task on a physical robot. We compare Recovery RL to 5 prior safe RL methods which jointly optimize for task performance and safety via constrained optimization or reward shaping and find that Recovery RL outperforms the next best prior method across all domains. Results suggest that Recovery RL trades off constraint violations and task successes 2–20 times more efficiently in simulation domains and 3 times more efficiently in physical experiments. See https://tinyurl.com/rl-recovery for videos and supplementary material.

中文翻译:

Recovery RL:通过学习的恢复区域进行安全加固学习

安全仍然是阻止在现实世界中广泛使用RL的主要障碍:在不确定的环境中学习新任务需要进行广泛的探索,而安全则需要进行有限的探索。我们提出了Recovery RL,该算法可通过(1)利用离线数据来了解违反约束的区域来导航此折衷 政策学习和(2) 分离跨两个策略提高任务性能和约束满意度的目标:仅优化任务奖励的任务策略,以及在可能违反约束条件时将代理引导至安全状态的恢复策略。我们在6个仿真域上评估了Recovery RL,包括两个接触丰富的操纵任务和一个基于图像的导航任务,以及在物理机器人上的基于图像的避障任务。我们将Recovery RL与5种先前的安全RL方法进行了比较,后者通过约束优化或奖励整形共同优化了任务性能和安全性,发现Recovery RL在所有领域的性能均优于次优方法。结果表明,Recovery RL在约束域和任务成功之间进行权衡,仿真领域的效率提高2至20倍,而物理实验的效率提高3倍。看https://tinyurl.com/rl-recovery 用于视频和补充材料。
更新日期:2021-04-27
down
wechat
bug