Learning to Embed Semantic Similarity for Joint Image-Text Retrieval,IEEE Transactions on Pattern Analysis and Machine Intelligence

当前位置： X-MOL 学术 › IEEE Trans. Pattern Anal. Mach. Intell. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Learning to Embed Semantic Similarity for Joint Image-Text Retrieval
IEEE Transactions on Pattern Analysis and Machine Intelligence ( IF 20.8 ) Pub Date : 2021-12-02 , DOI: 10.1109/tpami.2021.3132163
Noam Malali ₁ , Yosi Keller ₁

Affiliation

We present a deep learning approach for learning the joint semantic embeddings of images and captions in a euclidean space, such that the semantic similarity is approximated by the L2L_{2} distances in the embedding space. For that, we introduce a metric learning scheme that utilizes multitask learning to learn the embedding of identical semantic concepts using a center loss. By introducing a differentiable quantization scheme into the end-to-end trainable network, we derive a semantic embedding of semantically similar concepts in euclidean space. We also propose a novel metric learning formulation using an adaptive margin hinge loss, that is refined during the training phase. The proposed scheme was applied to the MS-COCO, Flicke30K and Flickr8K datasets, and was shown to compare favorably with contemporary state-of-the-art approaches.

中文翻译：

学习嵌入语义相似性以进行图像-文本联合检索

我们提出了一种深度学习方法，用于学习欧几里得空间中图像和标题的联合语义嵌入，从而通过嵌入空间中的 L2L_{2} 距离来近似语义相似性。为此，我们引入了一种度量学习方案，该方案利用多任务学习来使用中心损失来学习相同语义概念的嵌入。通过将可微量化方案引入端到端可训练网络，我们在欧几里得空间中推导了语义相似概念的语义嵌入。我们还提出了一种使用自适应边缘铰链损失的新颖度量学习公式，该公式在训练阶段进行了完善。所提出的方案应用于 MS-COCO、Flicke30K 和 Flickr8K 数据集，并被证明与当代最先进的方法相比具有优势。

更新日期：2021-12-02

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南11