Deep Top-k Ranking for Image-Sentence Matching,IEEE Transactions on Multimedia

当前位置： X-MOL 学术 › IEEE Trans. Multimedia › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Deep Top-k Ranking for Image-Sentence Matching
IEEE Transactions on Multimedia ( IF 8.4 ) Pub Date : 2020-03-01 , DOI: 10.1109/tmm.2019.2931352
Lingling Zhang , Minnan Luo , Jun Liu , Xiaojun Chang , Yi Yang , Alexander G. Hauptmann

Image–sentence matching is a challenging task for the heterogeneity-gap between different modalities. Ranking-based methods have achieved excellent performance in this task in past decades. Given an image query, these methods typically assume that the correct matched image–sentence pair must rank before all other mismatched ones. However, this assumption may be too strict and prone to the overfitting problem, especially when some sentences in a massive database are similar and confusable with one another. In this paper, we relax the traditional ranking loss and propose a novel deep multi-modal network with a top-

$k$

ranking loss to mitigate the data ambiguity problem. With this strategy, query results will not be penalized unless the index of ground truth is outside the range of top-

$k$

query results. Considering the non-smoothness and non-convexity of the initial top-

$k$

ranking loss, we exploit a tight convex upper bound to approximate the loss and then utilize the traditional back-propagation algorithm to optimize the deep multi-modal network. Finally, we apply the method on three benchmark datasets, namely, Flickr8k, Flickr30k, and MSCOCO. Empirical results on metrics R@K (K = 1, 5, 10) show that our method achieves comparable performance in comparison to state-of-the-art methods.

中文翻译：

图像句子匹配的深度 Top-k 排名

对于不同模态之间的异质性差距，图像-句子匹配是一项具有挑战性的任务。在过去的几十年中，基于排名的方法在这项任务中取得了出色的表现。给定一个图像查询，这些方法通常假设正确匹配的图像-句子对必须排在所有其他不匹配的之前。然而，这种假设可能过于严格，容易出现过拟合问题，尤其是当海量数据库中的某些句子相似且相互混淆时。在本文中，我们放松了传统的排名损失，并提出了一种新颖的深度多模态网络

$千$

排序损失以减轻数据歧义问题。使用这种策略，除非ground truth 的索引超出top-的范围，否则查询结果不会受到惩罚。

$千$

查询结果。考虑到初始顶部的非光滑性和非凸性

$千$

排名损失，我们利用紧凸上界来近似损失，然后利用传统的反向传播算法来优化深度多模态网络。最后，我们将该方法应用于三个基准数据集，即 Flickr8k、Flickr30k 和 MSCOCO。度量 R@K (K = 1, 5, 10) 的实证结果表明，与最先进的方法相比，我们的方法实现了可比的性能。

更新日期：2020-03-01

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南11