当前位置: X-MOL 学术Auton. Robot. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Expert Intervention Learning
Autonomous Robots ( IF 3.7 ) Pub Date : 2021-10-19 , DOI: 10.1007/s10514-021-10006-9
Jonathan Spencer 1 , Mung Chiang 1 , Peter Ramadge 1 , Sanjiban Choudhury 2 , Matthew Barnes 2 , Matthew Schmittle 2 , Sidd Srinivasa 2
Affiliation  

Scalable robot learning from human-robot interaction is critical if robots are to solve a multitude of tasks in the real world. Current approaches to imitation learning suffer from one of two drawbacks. On the one hand, they rely solely on off-policy human demonstration, which in some cases leads to a mismatch in train-test distribution. On the other, they burden the human to label every state the learner visits, rendering it impractical in many applications. We argue that learning interactively from expert interventions enjoys the best of both worlds. Our key insight is that any amount of expert feedback, whether by intervention or non-intervention, provides information about the quality of the current state, the quality of the action, or both. We formalize this as a constraint on the learner’s value function, which we can efficiently learn using no regret, online learning techniques. We call our approach Expert Intervention Learning (EIL), and evaluate it on a real and simulated driving task with a human expert, where it learns collision avoidance from scratch with just a few hundred samples (about one minute) of expert control.



中文翻译:

专家干预学习

如果机器人要解决现实世界中的大量任务,那么从人机交互中学习的可扩展机器人至关重要。当前的模仿学习方法存在两个缺点之一。一方面,它们完全依赖于非策略人类演示,这在某些情况下会导致训练测试分布不匹配。另一方面,它们让人类负担标记学习者访问的每个状态,使其在许多应用中不切实际。我们认为,从专家干预中交互式学习享受两全其美。我们的关键见解是,任何数量的专家反馈,无论是通过干预还是非干预,都会提供有关当前状态质量、行动质量或两者的信息。我们将此形式化为对学习者价值函数的约束,我们可以使用无遗憾的在线学习技术有效地学习。我们将我们的方法称为专家干预学习 (EIL),并与人类专家一起在真实和模拟的驾驶任务上对其进行评估,在该任务中,它通过专家控制的几百个样本(大约一分钟)从头开始学习避免碰撞。

更新日期:2021-10-20
down
wechat
bug