当前位置: X-MOL 学术Neural Netw. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Bridging multimedia heterogeneity gap via Graph Representation Learning for cross-modal retrieval
Neural Networks ( IF 7.8 ) Pub Date : 2020-11-28 , DOI: 10.1016/j.neunet.2020.11.011
Qingrong Cheng , Xiaodong Gu

Information retrieval among different modalities becomes a significant issue with many promising applications. However, inconsistent feature representation of various multimedia data causes the “heterogeneity gap” among various modalities, which is a challenge in cross-modal retrieval. For bridging the “heterogeneity gap,” the popular methods attempt to project the original data into a common representation space, which needs great fitting ability of the model. To address the above issue, we propose a novel Graph Representation Learning (GRL) method for bridging the heterogeneity gap, which does not project the original feature into an aligned representation space but adopts a cross-modal graph to link different modalities. The GRL approach consists of two subnetworks, Feature Transfer Learning Network (FTLN) and Graph Representation Learning Network (GRLN). Firstly, FTLN model finds a latent space for each modality, where the cosine similarity is suitable to describe their similarity. Then, we build a cross-modal graph to reconstruct the original data and their relationships. Finally, we abandon the features in the latent space and turn into embedding the graph vertexes into a common representation space directly. During the process, the proposed Graph Representation Learning method bypasses the most challenging issue by utilizing a cross-modal graph as a bridge to link the “heterogeneity gap” among different modalities. This attempt utilizes a cross-modal graph as an intermediary agent to bridge the “heterogeneity gap” in cross-modal retrieval, which is simple but effective. Extensive experiment results on six widely-used datasets indicate that the proposed GRL outperforms other state-of-the-art cross-modal retrieval methods.



中文翻译:

通过图表示学习弥合多媒体异质性差距,实现跨模式检索

在许多模式中,不同方式之间的信息检索已成为一个重要的问题。但是,各种多媒体数据的特征表示不一致会导致各种模态之间出现“异质性差距”,这对跨模态检索是一个挑战。为了弥合“异质性差距”,流行的方法试图将原始数据投影到一个公共的表示空间中,这需要模型的强大拟合能力。为了解决上述问题,我们提出了一种新颖的图表示学习(GRL)方法,以弥合异质性差距,该方法不会将原始特征投影到对齐的表示空间中,而是采用跨模态图来链接不同的模态。GRL方法由两个子网组成,特征转移学习网络(FTLN)和图形表示学习网络(GRLN)。首先,FTLN模型为每个模态找到一个潜在空间,其中余弦相似度适合描述它们的相似度。然后,我们建立一个跨模态图以重建原始数据及其关系。最后,我们放弃了潜在空间中的特征,而将图顶点直接嵌入到公共表示空间中。在此过程中,提出的“图形表示​​学习”方法通过使用交叉模式图作为桥接不同模式之间的“异构性差距”的桥梁,从而绕过了最具挑战性的问题。这项尝试利用交叉模态图作为中介,以弥合交叉模态检索中的“异质性缺口”,这很简单但有效。

更新日期:2020-12-01
down
wechat
bug