Learning Image Representations Tied to Egomotion from Unlabeled Video,International Journal of Computer Vision

当前位置： X-MOL 学术 › Int. J. Comput. Vis. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Learning Image Representations Tied to Egomotion from Unlabeled Video
International Journal of Computer Vision ( IF 11.6 ) Pub Date : 2017-03-04 , DOI: 10.1007/s11263-017-1001-2
Dinesh Jayaraman , Kristen Grauman

Understanding how images of objects and scenes behave in response to specific egomotions is a crucial aspect of proper visual development, yet existing visual learning methods are conspicuously disconnected from the physical source of their images. We propose a new “embodied” visual learning paradigm, exploiting proprioceptive motor signals to train visual representations from egocentric video with no manual supervision. Specifically, we enforce that our learned features exhibit equivariance i.e., they respond predictably to transformations associated with distinct egomotions. With three datasets, we show that our unsupervised feature learning approach significantly outperforms previous approaches on visual recognition and next-best-view prediction tasks. In the most challenging test, we show that features learned from video captured on an autonomous driving platform improve large-scale scene recognition in static images from a disjoint domain.

中文翻译：

从未标记的视频中学习与自我运动相关的图像表示

了解物体和场景的图像如何响应特定的自我运动是适当视觉发展的一个关键方面，但现有的视觉学习方法明显与其图像的物理来源脱节。我们提出了一种新的“体现”视觉学习范式，利用本体感觉运动信号从以自我为中心的视频中训练视觉表征，无需人工监督。具体来说，我们强制我们学习的特征表现出等方差，即，它们对与不同自我运动相关的转换做出可预测的响应。通过三个数据集，我们表明我们的无监督特征学习方法在视觉识别和次佳视图预测任务上明显优于以前的方法。在最具挑战性的考试中，

更新日期：2017-03-04

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南11