当前位置: X-MOL 学术arXiv.cs.RO › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Fighting Failures with FIRE: Failure Identification to Reduce Expert Burden in Intervention-Based Learning
arXiv - CS - Robotics Pub Date : 2020-07-01 , DOI: arxiv-2007.00245
Trevor Ablett, Filip Mari\'c, Jonathan Kelly

Supervised imitation learning, also known as behavioral cloning, suffers from distribution drift leading to failures during policy execution. One approach to mitigate this issue is to allow an expert to correct the agent's actions during task execution, based on the expert's determination that the agent has reached a `point of no return.' The agent's policy is then retrained using this new corrective data. This approach alone can enable high-performance agents to be learned, but at a substantial cost: the expert must vigilantly observe execution until the policy reaches a specified level of success, and even at that point, there is no guarantee that the policy will always succeed. To address these limitations, we present FIRE (Failure Identification to Reduce Expert Burden in intervention-based learning), a system that can predict when a running policy will fail, halt its execution, and request a correction from the expert. Unlike existing approaches that learn only from expert data, our approach learns from both expert and non-expert data, akin to adversarial learning. We demonstrate experimentally for a series of challenging manipulation tasks that our method is able to recognize state-action pairs that lead to failures. This permits seamless integration into an intervention-based learning system, where we show an order-of-magnitude gain in sample efficiency compared with a state-of-the-art inverse reinforcement learning method and dramatically improved performance over an equivalent amount of data learned with behavioral cloning.

中文翻译:

用 FIRE 对抗失败:失败识别以减少基于干预的学习中的专家负担

有监督的模仿学习,也称为行为克隆,存在分布漂移,导致策略执行失败。缓解此问题的一种方法是允许专家在任务执行期间纠正代理的行为,基于专家确定代理已达到“无法返回的点”。然后使用这个新的校正数据重新训练代理的策略。仅这种方法就可以学习高性能代理,但代价是巨大的:专家必须警惕地观察执行,直到策略达到指定的成功水平,即使在那个时候,也不能保证策略总是成功。为了解决这些限制,我们提出了 FIRE(在基于干预的学习中减少专家负担的故障识别),一种可以预测正在运行的策略何时会失败、停止执行并请求专家更正的系统。与仅从专家数据中学习的现有方法不同,我们的方法可以从专家和非专家数据中学习,类似于对抗性学习。我们在一系列具有挑战性的操作任务中通过实验证明,我们的方法能够识别导致失败的状态-动作对。这允许无缝集成到基于干预的学习系统中,与最先进的逆强化学习方法相比,我们在样本效率方面表现出数量级的增长,并在等量的学习数据上显着提高了性能与行为克隆。与仅从专家数据中学习的现有方法不同,我们的方法可以从专家和非专家数据中学习,类似于对抗性学习。我们在一系列具有挑战性的操作任务中通过实验证明,我们的方法能够识别导致失败的状态-动作对。这允许无缝集成到基于干预的学习系统中,与最先进的逆强化学习方法相比,我们在样本效率方面表现出数量级的增长,并在等量的学习数据上显着提高了性能与行为克隆。与仅从专家数据中学习的现有方法不同,我们的方法可以从专家和非专家数据中学习,类似于对抗性学习。我们在一系列具有挑战性的操作任务中通过实验证明,我们的方法能够识别导致失败的状态-动作对。这允许无缝集成到基于干预的学习系统中,与最先进的逆强化学习方法相比,我们在样本效率方面表现出数量级的增长,并在等量的学习数据上显着提高了性能与行为克隆。我们在一系列具有挑战性的操作任务中通过实验证明,我们的方法能够识别导致失败的状态-动作对。这允许无缝集成到基于干预的学习系统中,与最先进的逆强化学习方法相比,我们在样本效率方面表现出数量级的增长,并在等量的学习数据上显着提高了性能与行为克隆。我们在一系列具有挑战性的操作任务中通过实验证明,我们的方法能够识别导致失败的状态-动作对。这允许无缝集成到基于干预的学习系统中,与最先进的逆强化学习方法相比,我们在样本效率方面表现出数量级的增长,并在等量的学习数据上显着提高了性能与行为克隆。
更新日期:2020-08-11
down
wechat
bug