当前位置: X-MOL 学术arXiv.cs.HC › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Gaze-Informed Multi-Objective Imitation Learning from Human Demonstrations
arXiv - CS - Human-Computer Interaction Pub Date : 2021-02-25 , DOI: arxiv-2102.13008
Ritwik Bera, Vinicius G. Goecks, Gregory M. Gremillion, Vernon J. Lawhern, John Valasek, Nicholas R. Waytowich

In the field of human-robot interaction, teaching learning agents from human demonstrations via supervised learning has been widely studied and successfully applied to multiple domains such as self-driving cars and robot manipulation. However, the majority of the work on learning from human demonstrations utilizes only behavioral information from the demonstrator, i.e. what actions were taken, and ignores other useful information. In particular, eye gaze information can give valuable insight towards where the demonstrator is allocating their visual attention, and leveraging such information has the potential to improve agent performance. Previous approaches have only studied the utilization of attention in simple, synchronous environments, limiting their applicability to real-world domains. This work proposes a novel imitation learning architecture to learn concurrently from human action demonstration and eye tracking data to solve tasks where human gaze information provides important context. The proposed method is applied to a visual navigation task, in which an unmanned quadrotor is trained to search for and navigate to a target vehicle in a real-world, photorealistic simulated environment. When compared to a baseline imitation learning architecture, results show that the proposed gaze augmented imitation learning model is able to learn policies that achieve significantly higher task completion rates, with more efficient paths, while simultaneously learning to predict human visual attention. This research aims to highlight the importance of multimodal learning of visual attention information from additional human input modalities and encourages the community to adopt them when training agents from human demonstrations to perform visuomotor tasks.

中文翻译:

从人类示范中凝视到信息的多目标模仿学习

在人机交互领域,通过监督学习从人类演示中教学习代理已得到广泛研究,并成功应用于自动驾驶汽车和机器人操纵等多个领域。但是,大多数从人类示威中学习的工作仅利用了示威者的行为信息,即采取了什么行动,而忽略了其他有用的信息。尤其是,视线信息可为演示者将视觉注意力分配到何处提供有价值的见解,而利用此类信息则有可能改善座席绩效。先前的方法仅研究了在简单的同步环境中注意力的利用,从而限制了其在现实世界中的适用性。这项工作提出了一种新颖的模仿学习体系结构,可以从人类动作演示和眼睛跟踪数据中同时学习,以解决人类凝视信息提供重要上下文的任务。所提出的方法被应用于视觉导航任务,在该视觉导航任务中,训练无人四旋翼飞机以在真实世界,真实感模拟环境中搜索并导航到目标车辆。当与基线模仿学习体系结构进行比较时,结果表明,提出的凝视增强模仿学习模型能够学习能够以更高效率的途径显着提高任务完成率的策略,同时还能学习预测人类的视觉注意力。
更新日期:2021-02-26
down
wechat
bug