当前位置:
X-MOL 学术
›
arXiv.cs.HC
›
论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Feature Expansive Reward Learning: Rethinking Human Input
arXiv - CS - Human-Computer Interaction Pub Date : 2020-06-23 , DOI: arxiv-2006.13208 Andreea Bobu, Marius Wiggert, Claire Tomlin, Anca D. Dragan
arXiv - CS - Human-Computer Interaction Pub Date : 2020-06-23 , DOI: arxiv-2006.13208 Andreea Bobu, Marius Wiggert, Claire Tomlin, Anca D. Dragan
In collaborative human-robot scenarios, when a person is not satisfied with
how a robot performs a task, they can intervene to correct it. Reward learning
methods enable the robot to adapt its reward function online based on such
human input. However, this online adaptation requires low sample complexity
algorithms which rely on simple functions of handcrafted features. In practice,
pre-specifying an exhaustive set of features the person might care about is
impossible; what should the robot do when the human correction cannot be
explained by the features it already has access to? Recent progress in deep
Inverse Reinforcement Learning (IRL) suggests that the robot could fall back on
demonstrations: ask the human for demonstrations of the task, and recover a
reward defined over not just the known features, but also the raw state space.
Our insight is that rather than implicitly learning about the missing
feature(s) from task demonstrations, the robot should instead ask for data that
explicitly teaches it about what it is missing. We introduce a new type of
human input, in which the person guides the robot from areas of the state space
where the feature she is teaching is highly expressed to states where it is
not. We propose an algorithm for learning the feature from the raw state space
and integrating it into the reward function. By focusing the human input on the
missing feature, our method decreases sample complexity and improves
generalization of the learned reward over the above deep IRL baseline. We show
this in experiments with a 7DOF robot manipulator. Finally, we discuss our
method's potential implications for deep reward learning more broadly: taking a
divide-and-conquer approach that focuses on important features separately
before learning from demonstrations can improve generalization in tasks where
such features are easy for the human to teach.
中文翻译:
特征扩展奖励学习:重新思考人类输入
在人机协作场景中,当一个人对机器人执行任务的方式不满意时,他们可以进行干预以纠正它。奖励学习方法使机器人能够根据此类人类输入在线调整其奖励功能。然而,这种在线适应需要依赖于手工特征的简单功能的低样本复杂度算法。在实践中,预先指定一个人可能关心的详尽的特征集是不可能的。当机器人已经可以访问的功能无法解释人工校正时,机器人应该怎么做?深度逆强化学习 (IRL) 的最新进展表明,机器人可以依靠演示:要求人类演示任务,并恢复不仅定义在已知特征上,还定义在原始状态空间上的奖励。我们的见解是,机器人不应从任务演示中隐式地了解缺失的特征,而是应该询问数据,明确地告诉它缺少什么。我们引入了一种新型的人类输入,其中人引导机器人从状态空间的区域中,她正在教授的特征高度表达到没有的状态。我们提出了一种从原始状态空间学习特征并将其集成到奖励函数中的算法。通过将人工输入集中在缺失的特征上,我们的方法降低了样本复杂性并提高了对上述深度 IRL 基线的学习奖励的泛化。我们在 7DOF 机器人操纵器的实验中展示了这一点。最后,我们更广泛地讨论我们的方法对深度奖励学习的潜在影响:
更新日期:2020-06-24
中文翻译:
特征扩展奖励学习:重新思考人类输入
在人机协作场景中,当一个人对机器人执行任务的方式不满意时,他们可以进行干预以纠正它。奖励学习方法使机器人能够根据此类人类输入在线调整其奖励功能。然而,这种在线适应需要依赖于手工特征的简单功能的低样本复杂度算法。在实践中,预先指定一个人可能关心的详尽的特征集是不可能的。当机器人已经可以访问的功能无法解释人工校正时,机器人应该怎么做?深度逆强化学习 (IRL) 的最新进展表明,机器人可以依靠演示:要求人类演示任务,并恢复不仅定义在已知特征上,还定义在原始状态空间上的奖励。我们的见解是,机器人不应从任务演示中隐式地了解缺失的特征,而是应该询问数据,明确地告诉它缺少什么。我们引入了一种新型的人类输入,其中人引导机器人从状态空间的区域中,她正在教授的特征高度表达到没有的状态。我们提出了一种从原始状态空间学习特征并将其集成到奖励函数中的算法。通过将人工输入集中在缺失的特征上,我们的方法降低了样本复杂性并提高了对上述深度 IRL 基线的学习奖励的泛化。我们在 7DOF 机器人操纵器的实验中展示了这一点。最后,我们更广泛地讨论我们的方法对深度奖励学习的潜在影响: