Bridging multimedia heterogeneity gap via Graph Representation Learning for cross-modal retrieval,Neural Networks

当前位置： X-MOL 学术 › Neural Netw. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Bridging multimedia heterogeneity gap via Graph Representation Learning for cross-modal retrieval
Neural Networks ( IF 7.8 ) Pub Date : 2020-11-28 , DOI: 10.1016/j.neunet.2020.11.011
Qingrong Cheng , Xiaodong Gu

Information retrieval among different modalities becomes a significant issue with many promising applications. However, inconsistent feature representation of various multimedia data causes the “heterogeneity gap” among various modalities, which is a challenge in cross-modal retrieval. For bridging the “heterogeneity gap,” the popular methods attempt to project the original data into a common representation space, which needs great fitting ability of the model. To address the above issue, we propose a novel Graph Representation Learning (GRL) method for bridging the heterogeneity gap, which does not project the original feature into an aligned representation space but adopts a cross-modal graph to link different modalities. The GRL approach consists of two subnetworks, Feature Transfer Learning Network (FTLN) and Graph Representation Learning Network (GRLN). Firstly, FTLN model finds a latent space for each modality, where the cosine similarity is suitable to describe their similarity. Then, we build a cross-modal graph to reconstruct the original data and their relationships. Finally, we abandon the features in the latent space and turn into embedding the graph vertexes into a common representation space directly. During the process, the proposed Graph Representation Learning method bypasses the most challenging issue by utilizing a cross-modal graph as a bridge to link the “heterogeneity gap” among different modalities. This attempt utilizes a cross-modal graph as an intermediary agent to bridge the “heterogeneity gap” in cross-modal retrieval, which is simple but effective. Extensive experiment results on six widely-used datasets indicate that the proposed GRL outperforms other state-of-the-art cross-modal retrieval methods.

中文翻译：

通过图表示学习弥合多媒体异质性差距，实现跨模式检索

在许多模式中，不同方式之间的信息检索已成为一个重要的问题。但是，各种多媒体数据的特征表示不一致会导致各种模态之间出现“异质性差距”，这对跨模态检索是一个挑战。为了弥合“异质性差距”，流行的方法试图将原始数据投影到一个公共的表示空间中，这需要模型的强大拟合能力。为了解决上述问题，我们提出了一种新颖的图表示学习（GRL）方法，以弥合异质性差距，该方法不会将原始特征投影到对齐的表示空间中，而是采用跨模态图来链接不同的模态。GRL方法由两个子网组成，特征转移学习网络（FTLN）和图形表示学习网络（GRLN）。首先，FTLN模型为每个模态找到一个潜在空间，其中余弦相似度适合描述它们的相似度。然后，我们建立一个跨模态图以重建原始数据及其关系。最后，我们放弃了潜在空间中的特征，而将图顶点直接嵌入到公共表示空间中。在此过程中，提出的“图形表示学习”方法通过使用交叉模式图作为桥接不同模式之间的“异构性差距”的桥梁，从而绕过了最具挑战性的问题。这项尝试利用交叉模态图作为中介，以弥合交叉模态检索中的“异质性缺口”，这很简单但有效。

更新日期：2020-12-01

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南

全部期刊列表>>