Learning Visually Guided Latent Actions for Assistive Teleoperation,arXiv - CS - Robotics

当前位置： X-MOL 学术 › arXiv.cs.RO › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Learning Visually Guided Latent Actions for Assistive Teleoperation
arXiv - CS - Robotics Pub Date : 2021-05-02 , DOI: arxiv-2105.00580
Siddharth Karamcheti, Albert J. Zhai, Dylan P. Losey, Dorsa Sadigh

It is challenging for humans -- particularly those living with physical disabilities -- to control high-dimensional, dexterous robots. Prior work explores learning embedding functions that map a human's low-dimensional inputs (e.g., via a joystick) to complex, high-dimensional robot actions for assistive teleoperation; however, a central problem is that there are many more high-dimensional actions than available low-dimensional inputs. To extract the correct action and maximally assist their human controller, robots must reason over their context: for example, pressing a joystick down when interacting with a coffee cup indicates a different action than when interacting with knife. In this work, we develop assistive robots that condition their latent embeddings on visual inputs. We explore a spectrum of visual encoders and show that incorporating object detectors pretrained on small amounts of cheap, easy-to-collect structured data enables i) accurately and robustly recognizing the current context and ii) generalizing control embeddings to new objects and tasks. In user studies with a high-dimensional physical robot arm, participants leverage this approach to perform new tasks with unseen objects. Our results indicate that structured visual representations improve few-shot performance and are subjectively preferred by users.

中文翻译：

学习视觉引导的潜在动作以辅助遥距操作

对于人类-尤其是肢体残障人士-来说，控制高维灵巧机器人具有挑战性。先前的工作探索了学习嵌入功能，这些功能将人类的低维输入（例如，通过操纵杆）映射到复杂的高维机器人动作，以进行辅助遥控操作；但是，一个中心问题是，高维动作要多于可用的低维输入。为了提取正确的动作并最大程度地帮助他们的人工控制器，机器人必须根据自己的情况进行推理：例如，与咖啡杯互动时按下操纵杆所表示的动作与与刀互动时所采取的动作不同。在这项工作中，我们开发了辅助机器人，将其潜在的嵌入条件置于视觉输入上。我们探索了视觉编码器的范围，并显示了结合在少量廉价，易于收集的结构化数据上进行预训练的对象检测器，可以使i）准确，可靠地识别当前上下文，并且ii）将控制嵌入泛化到新的对象和任务。在使用高尺寸物理机器人手臂进行的用户研究中，参与者利用此方法对看不见的物体执行新任务。我们的结果表明，结构化的视觉表示可改善少拍性能，并在主观上受到用户的青睐。参与者利用这种方法执行带有看不见物体的新任务。我们的结果表明，结构化的视觉表示可改善少拍性能，并在主观上受到用户的青睐。参与者利用这种方法执行带有看不见物体的新任务。我们的结果表明，结构化的视觉表示可改善少拍性能，并在主观上受到用户的青睐。

更新日期：2021-05-04

点击分享查看原文

点击收藏

阅读更多本刊最新论文