Modality-Specific and Shared Generative Adversarial Network for Cross-modal Retrieval,Pattern Recognition

当前位置： X-MOL 学术 › Pattern Recogn. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Modality-Specific and Shared Generative Adversarial Network for Cross-modal Retrieval
Pattern Recognition ( IF 7.5 ) Pub Date : 2020-08-01 , DOI: 10.1016/j.patcog.2020.107335
Fei Wu , Xiao-Yuan Jing , Zhiyong Wu , Yimu Ji , Xiwei Dong , Xiaokai Luo , Qinghua Huang , Ruchuan Wang

Abstract Cross-modal retrieval aims to realize accurate and flexible retrieval across different modalities of data, e.g., image and text, which has achieved significant progress in recent years, especially since generative adversarial networks (GAN) were used. However, there still exists much room for improvement. How to jointly extract and utilize both the modality-specific (complementarity) and modality-shared (correlation) features effectively has not been well studied. In this paper, we propose an approach named Modality-Specific and Shared Generative Adversarial Network (MS2GAN) for cross-modal retrieval. The network architecture consists of two sub-networks that aim to learn modality-specific features for each modality, followed by a common sub-network that aims to learn the modality-shared features for each modality. Network training is guided by the adversarial scheme between the generative and discriminative models. The generative model learns to predict the semantic labels of features, model the inter- and intra-modal similarity with label information, and ensure the difference between the modality-specific and modality-shared features, while the discriminative model learns to classify the modality of features. The learned modality-specific and shared feature representations are jointly used for retrieval. Experiments on three widely used benchmark multi-modal datasets demonstrate that MS2GAN can outperform state-of-the-art related works.

中文翻译：

用于跨模态检索的模态特定和共享生成对抗网络

摘要跨模态检索旨在实现跨不同模态数据（例如图像和文本）的准确和灵活检索，近年来取得了重大进展，特别是自从使用生成对抗网络（GAN）以来。然而，仍有很大的改进空间。如何有效地联合提取和利用模态特定（互补）和模态共享（相关）特征尚未得到很好的研究。在本文中，我们提出了一种名为 Modality-Specific and Shared Generative Adversarial Network (MS2GAN) 的跨模态检索方法。网络架构由两个子网络组成，旨在为每个模态学习特定于模态的特征，然后是一个公共子网络，旨在学习每个模态的模态共享特征。网络训练由生成模型和判别模型之间的对抗方案指导。生成模型学习预测特征的语义标签，用标签信息对模态间和模态内相似性进行建模，并确保模态特定特征和模态共享特征之间的差异，而判别模型学习对模态进行分类。特征。学习到的特定模态和共享特征表示联合用于检索。在三个广泛使用的基准多模态数据集上进行的实验表明，MS2GAN 可以胜过最先进的相关工作。并确保模态特定特征和模态共享特征之间的差异，而判别模型则学习对特征的模态进行分类。学习到的特定模态和共享特征表示联合用于检索。在三个广泛使用的基准多模态数据集上进行的实验表明，MS2GAN 可以胜过最先进的相关工作。并确保模态特定特征和模态共享特征之间的差异，而判别模型则学习对特征的模态进行分类。学习到的特定模态和共享特征表示联合用于检索。在三个广泛使用的基准多模态数据集上进行的实验表明，MS2GAN 可以胜过最先进的相关工作。

更新日期：2020-08-01

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南11