当前位置: X-MOL 学术Pattern Recogn. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Cross-modal Discriminant Adversarial Network
Pattern Recognition ( IF 7.5 ) Pub Date : 2021-04-01 , DOI: 10.1016/j.patcog.2020.107734
Peng Hu , Xi Peng , Hongyuan Zhu , Jie Lin , Liangli Zhen , Wei Wang , Dezhong Peng

Abstract Cross-modal retrieval aims at retrieving relevant points across different modalities, such as retrieving images via texts. One key challenge of cross-modal retrieval is narrowing the heterogeneous gap across diverse modalities. To overcome this challenge, we propose a novel method termed as Cross-modal discriminant Adversarial Network (CAN). Taking bi-modal data as a showcase, CAN consists of two parallel modality-specific generators, two modality-specific discriminators, and a Cross-modal Discriminant Mechanism (CDM). To be specific, the generators project diverse modalities into a latent cross-modal discriminant space. Meanwhile, the discriminators compete against the generators to alleviate the heterogeneous discrepancy in this space, i.e., the generators try to generate unified features to confuse the discriminators, and the discriminators aim to classify the generated results. To further remove the redundancy and preserve the discrimination, we propose CDM to project the generated results into a single common space, accompanying with a novel eigenvalue-based loss. Thanks to the eigenvalue-based loss, CDM could push as much discriminative power as possible into all latent directions. To demonstrate the effectiveness of our CAN, comprehensive experiments are conducted on four multimedia datasets comparing with 15 state-of-the-art approaches.

中文翻译:

跨模态判别对抗网络

摘要 跨模态检索旨在检索跨不同模态的相关点,例如通过文本检索图像。跨模态检索的一个关键挑战是缩小不同模态之间的异质差距。为了克服这一挑战,我们提出了一种称为跨模态判别对抗网络(CAN)的新方法。以双模态数据为例,CAN 由两个并行的模态特定生成器、两个模态特定判别器和一个跨模态判别机制 (CDM) 组成。具体来说,生成器将不同的模态投射到潜在的跨模态判别空间中。同时,鉴别器与生成器竞争以缓解该空间中的异构差异,即生成器试图生成统一特征来混淆鉴别器,判别器旨在对生成的结果进行分类。为了进一步消除冗余并保持区分,我们建议 CDM 将生成的结果投影到单个公共空间中,并伴随新的基于特征值的损失。由于基于特征值的损失,CDM 可以将尽可能多的判别能力推向所有潜在方向。为了证明我们的 CAN 的有效性,在四个多媒体数据集上进行了综合实验,并与 15 种最先进的方法进行了比较。CDM 可以将尽可能多的辨别力推向所有潜在的方向。为了证明我们的 CAN 的有效性,在四个多媒体数据集上进行了综合实验,并与 15 种最先进的方法进行了比较。CDM 可以将尽可能多的辨别力推向所有潜在的方向。为了证明我们的 CAN 的有效性,在四个多媒体数据集上进行了综合实验,并与 15 种最先进的方法进行了比较。
更新日期:2021-04-01
down
wechat
bug