Attribute-Guided Attention for Referring Expression Generation and Comprehension,IEEE Transactions on Image Processing

当前位置： X-MOL 学术 › IEEE Trans. Image Process. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Attribute-Guided Attention for Referring Expression Generation and Comprehension
IEEE Transactions on Image Processing ( IF 10.8 ) Pub Date : 2020-03-12 , DOI: 10.1109/tip.2020.2979010
Jingyu Liu , Wei Wang , Liang Wang , Ming-Hsuan Yang

Referring expression is a special kind of verbal expression. The goal of referring expression is to refer to a particular object in some scenarios. Referring expression generation and comprehension are two inverse tasks within the field. Considering the critical role that visual attributes play in distinguishing the referred object from other objects, we propose an attribute-guided attention model to address the two tasks. In our proposed framework, attributes collected from referring expressions are used as explicit supervision signals on the generation and comprehension modules. The online predicted attributes of the visual object can benefit both tasks in two aspects: First, attributes can be directly embedded into the generation and comprehension modules, distinguishing the referred object as additional visual representations. Second, since attributes have their correspondence in both visual and textual space, an attribute-guided attention module is proposed as a bridging part to link the counterparts in visual representation and textual expression. Attention weights learned on both visual feature and word embeddings validate our motivation. We experiment on three standard datasets of RefCOCO, RefCOCO+ and RefCOCOg commonly used in this field. Both quantitative and qualitative results demonstrate the effectiveness of our proposed framework. The experimental results show significant improvements over baseline methods, and are favorably comparable to the state-of-the-art results. Further ablation study and analysis clearly demonstrate the contribution of each module, which could provide useful inspirations to the community.

中文翻译：

用于引用表达式生成和理解的属性引导注意

指称表达是一种特殊的言语表达。引用表达式的目标是在某些场景下引用特定对象。引用表达生成和理解是该领域内的两个相反任务。考虑到视觉属性在区分所引用的对象与其他对象方面发挥的关键作用，我们提出了一种属性引导的注意力模型来解决这两个任务。在我们提出的框架中，从引用表达式收集的属性被用作生成和理解模块的显式监督信号。视觉对象的在线预测属性可以在两个方面使这两项任务受益：首先，属性可以直接嵌入到生成和理解模块中，将所引用的对象区分为附加的视觉表示。其次，由于属性在视觉和文本空间中都有对应关系，因此提出了属性引导注意模块作为桥接部分，以链接视觉表示和文本表达中的对应部分。在视觉特征和词嵌入上学到的注意力权重验证了我们的动机。我们在该领域常用的三个标准数据集 RefCOCO、RefCOCO+ 和 RefCOCOg 上进行了实验。定量和定性结果都证明了我们提出的框架的有效性。实验结果显示比基线方法有显着改进，并且与最先进的结果相当。进一步的消融研究和分析清楚地证明了每个模块的贡献，这可以为社区提供有用的启发。

更新日期：2020-04-22

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南11