当前位置: X-MOL 学术arXiv.cs.MM › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Ref-NMS: Breaking Proposal Bottlenecks in Two-Stage Referring Expression Grounding
arXiv - CS - Multimedia Pub Date : 2020-09-03 , DOI: arxiv-2009.01449
Long Chen, Wenbo Ma, Jun Xiao, Hanwang Zhang, Wei Liu, Shih-Fu Chang

The prevailing framework for solving referring expression grounding is based on a two-stage process: 1) detecting proposals with an object detector and 2) grounding the referent to one of the proposals. Existing two-stage solutions mostly focus on the grounding step, which aims to align the expressions with the proposals. In this paper, we argue that these methods overlook an obvious mismatch between the roles of proposals in the two stages: they generate proposals solely based on the detection confidence (i.e., expression-agnostic), hoping that the proposals contain all right instances in the expression (i.e., expression-aware). Due to this mismatch, current two-stage methods suffer from a severe performance drop between detected and ground-truth proposals. To this end, we propose Ref-NMS, which is the first method to yield expression-aware proposals at the first stage. Ref-NMS regards all nouns in the expression as critical objects, and introduces a lightweight module to predict a score for aligning each box with a critical object. These scores can guide the NMSoperation to filter out the boxes irrelevant to the expression, increasing the recall of critical objects, resulting in a significantly improved grounding performance. Since Ref-NMS is agnostic to the grounding step, it can be easily integrated into any state-of-the-art two-stage method. Extensive ablation studies on several backbones, benchmarks, and tasks consistently demonstrate the superiority of Ref-NMS.

中文翻译:

Ref-NMS:打破两阶段引用表达式接地中的提案瓶颈

解决引用表达式基础的流行框架基于两个阶段的过程:1) 使用对象检测器检测提议,2) 将引用对象接地到提议之一。现有的两阶段解决方案主要集中在基础步骤,旨在使表达与提案保持一致。在本文中,我们认为这些方法忽略了两个阶段中提议的角色之间的明显不匹配:它们仅基于检测置信度(即表达不可知)生成提议,希望提议包含所有正确的实例表达(即表达感知)。由于这种不匹配,当前的两阶段方法在检测到的和真实的建议之间存在严重的性能下降。为此,我们提出了 Ref-NMS,这是在第一阶段产生表达感知提议的第一种方法。Ref-NMS 将表达式中的所有名词都视为关键对象,并引入了一个轻量级模块来预测将每个框与关键对象对齐的分数。这些分数可以指导 NMS 操作过滤掉与表达无关的框,增加关键对象的召回率,从而显着提高接地性能。由于 Ref-NMS 与接地步骤无关,因此它可以轻松集成到任何最先进的两阶段方法中。对几个主干、基准和任务的广泛消融研究一致证明了 Ref-NMS 的优越性。并引入了一个轻量级模块来预测将每个框与关键对象对齐的分数。这些分数可以指导 NMS 操作过滤掉与表达无关的框,增加关键对象的召回率,从而显着提高接地性能。由于 Ref-NMS 与接地步骤无关,因此它可以轻松集成到任何最先进的两阶段方法中。对几个主干、基准和任务的广泛消融研究一致证明了 Ref-NMS 的优越性。并引入了一个轻量级模块来预测将每个框与关键对象对齐的分数。这些分数可以指导 NMS 操作过滤掉与表达无关的框,增加关键对象的召回率,从而显着提高接地性能。由于 Ref-NMS 与接地步骤无关,因此它可以轻松集成到任何最先进的两阶段方法中。对几个主干、基准和任务的广泛消融研究一致证明了 Ref-NMS 的优越性。它可以轻松集成到任何最先进的两阶段方法中。对几个主干、基准和任务的广泛消融研究一致证明了 Ref-NMS 的优越性。它可以轻松集成到任何最先进的两阶段方法中。对几个主干、基准和任务的广泛消融研究一致证明了 Ref-NMS 的优越性。
更新日期:2020-09-04
down
wechat
bug