Using Eye Gaze to Enhance Generalization of Imitation Networks to Unseen Environments,IEEE Transactions on Neural Networks and Learning Systems

当前位置： X-MOL 学术 › IEEE Trans. Neural Netw. Learn. Syst. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Using Eye Gaze to Enhance Generalization of Imitation Networks to Unseen Environments
IEEE Transactions on Neural Networks and Learning Systems ( IF 10.2 ) Pub Date : 2020-06-01 , DOI: 10.1109/tnnls.2020.2996386
Congcong Liu , Yuying Chen , Ming Liu , Bertram E. Shi

Vision-based autonomous driving through imitation learning mimics the behavior of human drivers by mapping driver view images to driving actions. This article shows that performance can be enhanced via the use of eye gaze. Previous research has shown that observing an expert's gaze patterns can be beneficial for novice human learners. We show here that neural networks can also benefit. We trained a conditional generative adversarial network to estimate human gaze maps accurately from driver-view images. We describe two approaches to integrating gaze information into imitation networks: eye gaze as an additional input and gaze modulated dropout. Both significantly enhance generalization to unseen environments in comparison with a baseline vanilla network without gaze, but gaze-modulated dropout performs better. We evaluated performance quantitatively on both single images and in closed-loop tests, showing that gaze modulated dropout yields the lowest prediction error, the highest success rate in overtaking cars, the longest distance between infractions, lowest epistemic uncertainty, and improved data efficiency. Using Grad-CAM, we show that gaze modulated dropout enables the network to concentrate on task-relevant areas of the image.

中文翻译：

使用眼睛注视增强模仿网络对看不见的环境的泛化

通过模仿学习的基于视觉的自动驾驶通过将驾驶员视图图像映射到驾驶动作来模仿人类驾驶员的行为。本文表明，可以通过使用眼睛注视来提高性能。先前的研究表明，观察专家的凝视模式对于人类学习新手来说是有益的。我们在这里表明神经网络也可以受益。我们训练了一个条件生成对抗网络，以根据驾驶员视图图像准确估计人类注视图。我们描述了将注视信息集成到模仿网络中的两种方法：眼睛注视作为附加输入和注视调制丢失。与没有凝视的基线普通网络相比，两者都显着增强了对不可见环境的泛化能力，但凝视调制的 dropout 表现更好。我们对单幅图像和闭环测试的性能进行了定量评估，结果表明，凝视调制退出可以产生最低的预测误差、最高的超车成功率、最长的违规距离、最低的认知不确定性和更高的数据效率。使用 Grad-CAM，我们表明凝视调制丢失使网络能够专注于图像的任务相关区域。

更新日期：2020-06-01

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南11