当前位置:
X-MOL 学术
›
arXiv.cs.MM
›
论文详情
Our official English website, www.x-mol.net, welcomes your
feedback! (Note: you will need to create a separate account there.)
Similarity Reasoning and Filtration for Image-Text Matching
arXiv - CS - Multimedia Pub Date : 2021-01-05 , DOI: arxiv-2101.01368 Haiwen Diao, Ying Zhang, Lin Ma, Huchuan Lu
arXiv - CS - Multimedia Pub Date : 2021-01-05 , DOI: arxiv-2101.01368 Haiwen Diao, Ying Zhang, Lin Ma, Huchuan Lu
Image-text matching plays a critical role in bridging the vision and
language, and great progress has been made by exploiting the global alignment
between image and sentence, or local alignments between regions and words.
However, how to make the most of these alignments to infer more accurate
matching scores is still underexplored. In this paper, we propose a novel
Similarity Graph Reasoning and Attention Filtration (SGRAF) network for
image-text matching. Specifically, the vector-based similarity representations
are firstly learned to characterize the local and global alignments in a more
comprehensive manner, and then the Similarity Graph Reasoning (SGR) module
relying on one graph convolutional neural network is introduced to infer
relation-aware similarities with both the local and global alignments. The
Similarity Attention Filtration (SAF) module is further developed to integrate
these alignments effectively by selectively attending on the significant and
representative alignments and meanwhile casting aside the interferences of
non-meaningful alignments. We demonstrate the superiority of the proposed
method with achieving state-of-the-art performances on the Flickr30K and MSCOCO
datasets, and the good interpretability of SGR and SAF modules with extensive
qualitative experiments and analyses.
中文翻译:
图像文本匹配的相似性推理和过滤
图像-文本匹配在桥接视觉和语言方面起着至关重要的作用,并且通过利用图像和句子之间的全局对齐方式或区域和单词之间的局部对齐方式已经取得了巨大的进步。但是,如何充分利用这些比对来推断更准确的匹配分数仍未得到充分研究。在本文中,我们提出了一种新颖的相似图推理和注意力过滤(SGRAF)网络,用于图像-文本匹配。具体来说,首先学习基于向量的相似性表示以更全面的方式表征局部和全局比对,然后引入依赖于一个图卷积神经网络的相似性图推理(SGR)模块来推断与局部和全局对齐。相似注意过滤(SAF)模块经过进一步开发,可通过有选择地参与重要的和具有代表性的比对并同时消除无意义的比对的干扰来有效地整合这些比对。我们通过在Flickr30K和MSCOCO数据集上实现最先进的性能,证明了该方法的优越性,并且通过大量的定性实验和分析,SGR和SAF模块具有良好的可解释性。
更新日期:2021-01-06
中文翻译:
图像文本匹配的相似性推理和过滤
图像-文本匹配在桥接视觉和语言方面起着至关重要的作用,并且通过利用图像和句子之间的全局对齐方式或区域和单词之间的局部对齐方式已经取得了巨大的进步。但是,如何充分利用这些比对来推断更准确的匹配分数仍未得到充分研究。在本文中,我们提出了一种新颖的相似图推理和注意力过滤(SGRAF)网络,用于图像-文本匹配。具体来说,首先学习基于向量的相似性表示以更全面的方式表征局部和全局比对,然后引入依赖于一个图卷积神经网络的相似性图推理(SGR)模块来推断与局部和全局对齐。相似注意过滤(SAF)模块经过进一步开发,可通过有选择地参与重要的和具有代表性的比对并同时消除无意义的比对的干扰来有效地整合这些比对。我们通过在Flickr30K和MSCOCO数据集上实现最先进的性能,证明了该方法的优越性,并且通过大量的定性实验和分析,SGR和SAF模块具有良好的可解释性。