Feature Expansive Reward Learning: Rethinking Human Input,arXiv - CS - Human-Computer Interaction

当前位置： X-MOL 学术 › arXiv.cs.HC › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Feature Expansive Reward Learning: Rethinking Human Input
arXiv - CS - Human-Computer Interaction Pub Date : 2020-06-23 , DOI: arxiv-2006.13208
Andreea Bobu, Marius Wiggert, Claire Tomlin, Anca D. Dragan

In collaborative human-robot scenarios, when a person is not satisfied with how a robot performs a task, they can intervene to correct it. Reward learning methods enable the robot to adapt its reward function online based on such human input. However, this online adaptation requires low sample complexity algorithms which rely on simple functions of handcrafted features. In practice, pre-specifying an exhaustive set of features the person might care about is impossible; what should the robot do when the human correction cannot be explained by the features it already has access to? Recent progress in deep Inverse Reinforcement Learning (IRL) suggests that the robot could fall back on demonstrations: ask the human for demonstrations of the task, and recover a reward defined over not just the known features, but also the raw state space. Our insight is that rather than implicitly learning about the missing feature(s) from task demonstrations, the robot should instead ask for data that explicitly teaches it about what it is missing. We introduce a new type of human input, in which the person guides the robot from areas of the state space where the feature she is teaching is highly expressed to states where it is not. We propose an algorithm for learning the feature from the raw state space and integrating it into the reward function. By focusing the human input on the missing feature, our method decreases sample complexity and improves generalization of the learned reward over the above deep IRL baseline. We show this in experiments with a 7DOF robot manipulator. Finally, we discuss our method's potential implications for deep reward learning more broadly: taking a divide-and-conquer approach that focuses on important features separately before learning from demonstrations can improve generalization in tasks where such features are easy for the human to teach.

中文翻译：

特征扩展奖励学习：重新思考人类输入

在人机协作场景中，当一个人对机器人执行任务的方式不满意时，他们可以进行干预以纠正它。奖励学习方法使机器人能够根据此类人类输入在线调整其奖励功能。然而，这种在线适应需要依赖于手工特征的简单功能的低样本复杂度算法。在实践中，预先指定一个人可能关心的详尽的特征集是不可能的。当机器人已经可以访问的功能无法解释人工校正时，机器人应该怎么做？深度逆强化学习 (IRL) 的最新进展表明，机器人可以依靠演示：要求人类演示任务，并恢复不仅定义在已知特征上，还定义在原始状态空间上的奖励。我们的见解是，机器人不应从任务演示中隐式地了解缺失的特征，而是应该询问数据，明确地告诉它缺少什么。我们引入了一种新型的人类输入，其中人引导机器人从状态空间的区域中，她正在教授的特征高度表达到没有的状态。我们提出了一种从原始状态空间学习特征并将其集成到奖励函数中的算法。通过将人工输入集中在缺失的特征上，我们的方法降低了样本复杂性并提高了对上述深度 IRL 基线的学习奖励的泛化。我们在 7DOF 机器人操纵器的实验中展示了这一点。最后，我们更广泛地讨论我们的方法对深度奖励学习的潜在影响：

更新日期：2020-06-24

点击分享查看原文

点击收藏

阅读更多本刊最新论文

全部期刊列表>>