当前位置: X-MOL 学术arXiv.cs.SY › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Guided Uncertainty-Aware Policy Optimization: Combining Learning and Model-Based Strategies for Sample-Efficient Policy Learning
arXiv - CS - Systems and Control Pub Date : 2020-05-21 , DOI: arxiv-2005.10872
Michelle A. Lee, Carlos Florensa, Jonathan Tremblay, Nathan Ratliff, Animesh Garg, Fabio Ramos, Dieter Fox

Traditional robotic approaches rely on an accurate model of the environment, a detailed description of how to perform the task, and a robust perception system to keep track of the current state. On the other hand, reinforcement learning approaches can operate directly from raw sensory inputs with only a reward signal to describe the task, but are extremely sample-inefficient and brittle. In this work, we combine the strengths of model-based methods with the flexibility of learning-based methods to obtain a general method that is able to overcome inaccuracies in the robotics perception/actuation pipeline, while requiring minimal interactions with the environment. This is achieved by leveraging uncertainty estimates to divide the space in regions where the given model-based policy is reliable, and regions where it may have flaws or not be well defined. In these uncertain regions, we show that a locally learned-policy can be used directly with raw sensory inputs. We test our algorithm, Guided Uncertainty-Aware Policy Optimization (GUAPO), on a real-world robot performing peg insertion. Videos are available at https://sites.google.com/view/guapo-rl

中文翻译:

Guided Uncertainty-Aware Policy Optimization:结合学习和基于模型的策略以实现样本高效的策略学习

传统的机器人方法依赖于环境的准确模型、如何执行任务的详细描述以及用于跟踪当前状态的强大感知系统。另一方面,强化学习方法可以直接从原始感官输入运行,仅用奖励信号来描述任务,但样本效率极低且脆弱。在这项工作中,我们将基于模型的方法的优势与基于学习的方法的灵活性相结合,以获得一种通用方法,该方法能够克服机器人感知/驱动管道中的不准确性,同时需要与环境的交互最少。这是通过利用不确定性估计来划分给定的基于模型的政策可靠的区域中的空间来实现的,以及可能存在缺陷或定义不明确的区域。在这些不确定的区域中,我们展示了本地学习策略可以直接与原始感官输入一起使用。我们在一个真实世界的机器人上测试了我们的算法,引导不确定性感知策略优化 (GUAPO),在一个执行挂钩插入的真实世界机器人上。视频可在 https://sites.google.com/view/guapo-rl 获得
更新日期:2020-05-27
down
wechat
bug