Exploiting Textual Queries for Dynamically Visual Disambiguation,Pattern Recognition

当前位置： X-MOL 学术 › Pattern Recogn. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Exploiting Textual Queries for Dynamically Visual Disambiguation
Pattern Recognition ( IF 7.5 ) Pub Date : 2021-02-01 , DOI: 10.1016/j.patcog.2020.107620
Zeren Sun , Yazhou Yao , Jimin Xiao , Lei Zhang , Jian Zhang , Zhenmin Tang

Abstract Due to the high cost of manual annotation, learning directly from the web has attracted broad attention. One issue that limits the performance of current webly supervised models is the problem of visual polysemy. In this work, we present a novel framework that resolves visual polysemy by dynamically matching candidate text queries with retrieved images. Specifically, our proposed framework includes three major steps: we first discover and then dynamically select the text queries according to the keyword-based image search results, we employ the proposed saliency-guided deep multi-instance learning (MIL) network to remove outliers and learn classification models for visual disambiguation. Compared to existing methods, our proposed approach can figure out the right visual senses, adapt to dynamic changes in the search results, remove outliers, and jointly learn the classification models. Extensive experiments and ablation studies on CMU-Poly-30 and MIT-ISD datasets demonstrate the effectiveness of our proposed approach.

中文翻译：

利用文本查询进行动态视觉消歧

摘要由于人工标注成本高，直接从网络学习受到广泛关注。限制当前网络监督模型性能的一个问题是视觉多义性问题。在这项工作中，我们提出了一种新颖的框架，该框架通过将候选文本查询与检索到的图像动态匹配来解决视觉多义性。具体来说，我们提出的框架包括三个主要步骤：首先发现并根据基于关键字的图像搜索结果动态选择文本查询，我们采用提出的显着性引导的深度多实例学习（MIL）网络去除异常值和学习用于视觉消歧的分类模型。与现有方法相比，我们提出的方法可以找出正确的视觉感官，适应搜索结果的动态变化，去除异常值，共同学习分类模型。对 CMU-Poly-30 和 MIT-ISD 数据集的大量实验和消融研究证明了我们提出的方法的有效性。

更新日期：2021-02-01

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南11