当前位置: X-MOL 学术IEEE Trans. Cybern. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Learning Cross-Modal Common Representations by Private__hared Subspaces Separation
IEEE Transactions on Cybernetics ( IF 9.4 ) Pub Date : 8-11-2020 , DOI: 10.1109/tcyb.2020.3009004
Xing Xu 1 , Kaiyi Lin 1 , Lianli Gao 1 , Huimin Lu 2 , Heng Tao Shen 1 , Xuelong Li 3
Affiliation  

Due to the inconsistent distributions and representations of different modalities (e.g., images and texts), it is very challenging to correlate such heterogeneous data. A standard solution is to construct one common subspace, where the common representations of different modalities are generated to bridge the heterogeneity gap. Existing methods based on common representation learning mostly adopt a less effective two-stage paradigm: first, generating separate representations for each modality by exploiting the modality-specific properties as the complementary information, and then capturing the cross-modal correlation in the separate representations for common representation learning. Moreover, these methods usually neglect that there may exist interference in the modality-specific properties, that is, the unrelated objects and background regions in images or the noisy words and incorrect sentences in the text. In this article, we hypothesize that explicitly modeling the interference within each modality can improve the quality of common representation learning. To this end, we propose a novel model private_shared subspaces separation (P3S) to explicitly learn different representations that are partitioned into two kinds of subspaces: 1) the common representations that capture the cross-modal correlation in a shared subspace and 2) the private representations that model the interference within each modality in two private subspaces. By employing the orthogonality constraints between the shared subspace and the private subspaces during the one-stage joint learning procedure, our model is able to learn more effective common representations for different modalities in the shared subspace by fully excluding the interference within each modality. Extensive experiments conducted on cross-modal retrieval verify the advantages of our P3S method compared with 15 state-of-the-art methods on four widely used cross-modal datasets.

中文翻译:


通过 Private__hared 子空间分离学习跨模态通用表示



由于不同模态(例如图像和文本)的分布和表示不一致,关联此类异构数据非常具有挑战性。一种标准解决方案是构建一个公共子空间,在其中生成不同模态的公共表示以弥合异质性差距。现有的基于通用表示学习的方法大多采用效率较低的两阶段范式:首先,通过利用模态特定属性作为补充信息,为每种模态生成单独的表示,然后捕获单独表示中的跨模态相关性共同表征学习。此外,这些方法通常忽略了模态特定属性可能存在的干扰,即图像中不相关的对象和背景区域或文本中的噪声单词和不正确的句子。在本文中,我们假设对每种模态中的干扰进行显式建模可以提高公共表示学习的质量。为此,我们提出了一种新颖的模型私有_共享子空间分离(P3S),以显式学习分为两种子空间的不同表示:1)捕获共享子空间中跨模态相关性的公共表示,2)私有子空间对两个私有子空间中每种模态内的干扰进行建模的表示。通过在一阶段联合学习过程中采用共享子空间和私有子空间之间的正交性约束,我们的模型能够通过完全排除每种模态内的干扰来学习共享子空间中不同模态的更有效的共同表示。 在跨模态检索上进行的大量实验验证了我们的 P3S 方法与四个广泛使用的跨模态数据集上的 15 种最先进方法相比的优势。
更新日期:2024-08-22
down
wechat
bug