Re-ranking Image-text Matching by Adaptive Metric Fusion,Pattern Recognition

当前位置： X-MOL 学术 › Pattern Recogn. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Re-ranking Image-text Matching by Adaptive Metric Fusion
Pattern Recognition ( IF 7.5 ) Pub Date : 2020-08-01 , DOI: 10.1016/j.patcog.2020.107351
Kai Niu , Yan Huang , Liang Wang

Abstract Image-text matching has drawn much attention recently with the rapid growth of multi-modal data. Many effective approaches have been proposed to solve this challenging problem, but limited effort has been devoted to re-ranking methods. Compared with the uni-modal re-ranking methods, modality heterogeneity is the major difficulty when designing a re-ranking method in the cross-modal field, which mainly lies in two aspects of different visual and textual feature spaces and different distributions in inverse directions. In this paper, we propose a heuristic re-ranking method called Adaptive Metric Fusion (AMF) for image-text matching. The method can obtain a better metric by adaptively fusing metrics based on two modules: 1) Cross-modal Reciprocal Encoding, which considers ranks in inverse directions to comprehensively evaluate a metric. The sentence retrieval and image retrieval have different distribution characteristics and galleries in different modalities, thus it is necessary to exploit them simultaneously for appropriate metric fusion. 2) Query Replacement Gap, which quantifies the gap between cross-modal and uni-modal similarities to alleviate the influence of different visual and textual feature spaces on the fused metric. The proposed re-ranking method can be implemented in an unsupervised way without requiring any human interaction or annotated data, and can be easily applied to any initial ranking result. Extensive experiments and analysis validate the effectiveness of our method on the large-scale MS-COCO and Flickr30K datasets.

中文翻译：

通过自适应度量融合重新排序图像文本匹配

摘要近年来，随着多模态数据的快速增长，图文匹配备受关注。已经提出了许多有效的方法来解决这个具有挑战性的问题，但致力于重新排序方法的努力有限。与单模态重排序方法相比，模态异质性是设计跨模态重排序方法的主要难点，主要表现在视觉和文本特征空间不同和逆向分布不同两个方面。 . 在本文中，我们提出了一种启发式重新排序方法，称为自适应度量融合（AMF），用于图像文本匹配。该方法可以通过基于两个模块自适应融合度量来获得更好的度量：1）跨模态互易编码，它考虑了逆向的秩来综合评估度量。句子检索和图像检索在不同模态下具有不同的分布特征和图库，因此有必要同时利用它们进行适当的度量融合。2) Query Replacement Gap，量化跨模态和单模态相似性之间的差距，以减轻不同视觉和文本特征空间对融合度量的影响。所提出的重新排序方法可以以无监督的方式实现，无需任何人工交互或注释数据，并且可以轻松应用于任何初始排序结果。大量的实验和分析验证了我们的方法在大规模 MS-COCO 和 Flickr30K 数据集上的有效性。因此，有必要同时利用它们进行适当的度量融合。2) Query Replacement Gap，量化跨模态和单模态相似性之间的差距，以减轻不同视觉和文本特征空间对融合度量的影响。所提出的重新排序方法可以以无监督的方式实现，无需任何人工交互或注释数据，并且可以轻松应用于任何初始排序结果。大量的实验和分析验证了我们的方法在大规模 MS-COCO 和 Flickr30K 数据集上的有效性。因此，有必要同时利用它们进行适当的度量融合。2) Query Replacement Gap，量化跨模态和单模态相似性之间的差距，以减轻不同视觉和文本特征空间对融合度量的影响。所提出的重新排序方法可以以无监督的方式实现，无需任何人工交互或注释数据，并且可以轻松应用于任何初始排序结果。大量的实验和分析验证了我们的方法在大规模 MS-COCO 和 Flickr30K 数据集上的有效性。所提出的重新排序方法可以以无监督的方式实现，无需任何人工交互或注释数据，并且可以轻松应用于任何初始排序结果。大量的实验和分析验证了我们的方法在大规模 MS-COCO 和 Flickr30K 数据集上的有效性。所提出的重新排序方法可以以无监督的方式实现，无需任何人工交互或注释数据，并且可以轻松应用于任何初始排序结果。大量的实验和分析验证了我们的方法在大规模 MS-COCO 和 Flickr30K 数据集上的有效性。

更新日期：2020-08-01

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南11