当前位置: X-MOL 学术IEEE Trans. Cybern. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Landmark Image Retrieval by Jointing Feature Refinement and Multimodal Classifier Learning
IEEE Transactions on Cybernetics ( IF 9.4 ) Pub Date : 2017-06-20 , DOI: 10.1109/tcyb.2017.2712798
Xiaoming Zhang , Senzhang Wang , Zhoujun Li , Shuai Ma

Landmark retrieval is to return a set of images with their landmarks similar to those of the query images. Existing studies on landmark retrieval focus on exploiting the geometries of landmarks for visual similarity matches. However, the visual content of social images is of large diversity in many landmarks, and also some images share common patterns over different landmarks. On the other side, it has been observed that social images usually contain multimodal contents, i.e., visual content and text tags, and each landmark has the unique characteristic of both visual content and text content. Therefore, the approaches based on similarity matching may not be effective in this environment. In this paper, we investigate whether the geographical correlation among the visual content and the text content could be exploited for landmark retrieval. In particular, we propose an effective multimodal landmark classification paradigm to leverage the multimodal contents of social image for landmark retrieval, which integrates feature refinement and landmark classifier with multimodal contents by a joint model. The geo-tagged images are automatically labeled for classifier learning. Visual features are refined based on low rank matrix recovery, and multimodal classification combined with group sparse is learned from the automatically labeled images. Finally, candidate images are ranked by combining classification result and semantic consistence measuring between the visual content and text content. Experiments on real-world datasets demonstrate the superiority of the proposed approach as compared to existing methods.

中文翻译:


通过联合特征细化和多模态分类器学习进行地标图像检索



地标检索是返回一组与查询图像的地标相似的图像。现有的地标检索研究重点是利用地标的几何形状进行视觉相似性匹配。然而,社交图像的视觉内容在许多地标中具有很大的多样性,并且一些图像在不同地标上具有共同的模式。另一方面,据观察,社交图像通常包含多模态内容,即视觉内容和文本标签,并且每个地标都具有视觉内容和文本内容的独特特征。因此,基于相似性匹配的方法在这种环境下可能并不有效。在本文中,我们研究了视觉内容和文本内容之间的地理相关性是否可以用于地标检索。特别是,我们提出了一种有效的多模态地标分类范例,利用社会图像的多模态内容进行地标检索,该范式通过联合模型将特征细化和地标分类器与多模态内容集成在一起。带地理标记的图像会自动标记以供分类器学习。基于低秩矩阵恢复来细化视觉特征,并从自动标记的图像中学习与组稀疏相结合的多模态分类。最后,结合分类结果和视觉内容与文本内容之间的语义一致性测量对候选图像进行排序。对现实世界数据集的实验证明了所提出的方法与现有方法相比的优越性。
更新日期:2017-06-20
down
wechat
bug