当前位置: X-MOL 学术Auton. Robot. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Affordance-based robot object retrieval
Autonomous Robots ( IF 3.5 ) Pub Date : 2021-08-30 , DOI: 10.1007/s10514-021-10008-7
Thao Nguyen 1 , Nakul Gopalan 1, 2 , Roma Patel 1 , Matt Corsaro 1 , Ellie Pavlick 1 , Stefanie Tellex 1
Affiliation  

Natural language object retrieval is a highly useful yet challenging task for robots in human-centric environments. Previous work has primarily focused on commands specifying the desired object’s type such as “scissors” and/or visual attributes such as “red,” thus limiting the robot to only known object classes. We develop a model to retrieve objects based on descriptions of their usage. The model takes in a language command containing a verb, for example “Hand me something to cut,” and RGB images of candidate objects; and outputs the object that best satisfies the task specified by the verb. Our model directly predicts an object’s appearance from the object’s use specified by a verb phrase, without needing an object’s class label. Based on contextual information present in the language commands, our model can generalize to unseen object classes and unknown nouns in the commands. Our model correctly selects objects out of sets of five candidates to fulfill natural language commands, and achieves a mean reciprocal rank of 77.4% on a held-out test set of unseen ImageNet object classes and 69.1% on unseen object classes and unknown nouns. Our model also achieves a mean reciprocal rank of 71.8% on unseen YCB object classes, which have a different image distribution from ImageNet. We demonstrate our model on a KUKA LBR iiwa robot arm, enabling the robot to retrieve objects based on natural language descriptions of their usage (Video recordings of the robot demonstrations can be found at https://youtu.be/WMAdGhMmXEQ). We also present a new dataset of 655 verb-object pairs denoting object usage over 50 verbs and 216 object classes (The dataset and code for the project can be found at https://github.com/Thaonguyen3095/affordance-language).



中文翻译:

基于可供性的机器人对象检索

对于在以人为中心的环境中的机器人来说,自然语言对象检索是一项非常有用但具有挑战性的任务。以前的工作主要集中在指定所需对象类型(例如“剪刀”)和/或视觉属性(例如“红色”)的命令上,从而将机器人限制为仅已知对象类别。我们开发了一个模型来根据对象的使用描述来检索对象。该模型接受包含动词的语言命令,例如“给我一些要切割的东西,”和候选对象的 RGB 图像;并输出最能满足动词指定任务的对象。我们的模型直接根据动词短语指定的对象使用来预测对象的外观,而无需对象的类标签。基于语言命令中存在的上下文信息,我们的模型可以推广到命令中看不见的对象类和未知名词。我们的模型从五个候选集合中正确选择对象来完成自然语言命令,并在未见 ImageNet 对象类的保留测试集上实现了 77.4% 的平均倒数排名,在不可见对象类上实现了 69.1% 的平均倒数排名未知名词。我们的模型在未见过的 YCB 对象类上也实现了 71.8% 的平均倒数排名,这些类具有与 ImageNet 不同的图像分布。我们在 KUKA LBR iiwa 机器人手臂上演示了我们的模型,使机器人能够根据其使用的自然语言描述来检索对象(机器人演示的视频记录可以在 https://youtu.be/WMAdGhMmXEQ 上找到)。我们还提供了一个包含 655 个动词-对象对的新数据集,表示超过 50 个动词和 216 个对象类的对象用法(该项目的数据集和代码可以在 https://github.com/Thaonguyen3095/affordance-language 中找到)。

更新日期:2021-08-31
down
wechat
bug