当前位置: X-MOL 学术IEEE Trans. Pattern Anal. Mach. Intell. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
NetVLAD: CNN Architecture for Weakly Supervised Place Recognition
IEEE Transactions on Pattern Analysis and Machine Intelligence ( IF 23.6 ) Pub Date : 2017-06-01 , DOI: 10.1109/tpami.2017.2711011
Relja Arandjelovic , Petr Gronat , Akihiko Torii , Tomas Pajdla , Josef Sivic

We tackle the problem of large scale visual place recognition, where the task is to quickly and accurately recognize the location of a given query photograph. We present the following four principal contributions. First, we develop a convolutional neural network (CNN) architecture that is trainable in an end-to-end manner directly for the place recognition task. The main component of this architecture, NetVLAD, is a new generalized VLAD layer, inspired by the “Vector of Locally Aggregated Descriptors” image representation commonly used in image retrieval. The layer is readily pluggable into any CNN architecture and amenable to training via backpropagation. Second, we create a new weakly supervised ranking loss, which enables end-to-end learning of the architecture's parameters from images depicting the same places over time downloaded from Google Street View Time Machine. Third, we develop an efficient training procedure which can be applied on very large-scale weakly labelled tasks. Finally, we show that the proposed architecture and training procedure significantly outperform non-learnt image representations and off-the-shelf CNN descriptors on challenging place recognition and image retrieval benchmarks.

中文翻译:

NetVLAD:用于弱监督位置识别的CNN体​​系结构

我们解决了大规模视觉位置识别的问题,该任务是快速准确地识别给定查询照片的位置。我们提出以下四个主要贡献。首先,我们开发了一种卷积神经网络(CNN)架构,该架构可以以端到端的方式直接进行位置识别任务的训练。该体系结构的主要组件NetVLAD是一个新的通用VLAD层,其灵感来自图像检索中常用的“局部聚合描述符向量”图像表示。该层很容易插入任何CNN架构中,并且可以通过反向传播进行训练。其次,我们创建了一个新的弱监督排名损失,这使得能够对架构的端到端学习。从Google Street View Time Machine下载的随时间推移描绘相同地点的图像中的s参数。第三,我们开发了一种有效的培训程序,可以将其应用于大规模的弱标签任务。最后,我们表明,在具有挑战性的位置识别和图像检索基准测试中,拟议的体系结构和训练程序明显优于非学习图像表示和现成的CNN描述符。
更新日期:2018-05-05
down
wechat
bug