Self-attention and adversary learning deep hashing network for cross-modal retrieval,Computers & Electrical Engineering

当前位置： X-MOL 学术 › Comput. Electr. Eng. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Self-attention and adversary learning deep hashing network for cross-modal retrieval
Computers & Electrical Engineering ( IF 4.0 ) Pub Date : 2021-07-12 , DOI: 10.1016/j.compeleceng.2021.107262
Shubai Chen ₁ , Song Wu ₁ , Li Wang ₂ , Zhenyang Yu ₁

Affiliation

Multi-modal information retrieval is among the prevailing forms of daily human–computer interaction. The recent deep cross-modal hashing methods have received increasing attention because of their superior search performance and efficiency capability. However, effectively exploring the high-ranking semantic correlation and preserving representation consistency are still challengeable due to the heterogeneity property of different modalities. In this paper, a Self-Attention and Adversary Learning Hashing Network (SAALDH) is designed for large scale cross-modal retrieval. Specifically, the hash representations across different layers of the deep network are integrated and then the significance of each position in the integrated hash representation is enhanced by employing a novel self-attention mechanism. Meanwhile, an adversarial learning mechanism is adopted to further preserve the consistency of hash representations during hash functions learning. Moreover, a novel batch semi-hard selection is designed for triplet loss to solve the issue of local optimum during the optimization of SAALDH. Experimental results evaluated on two large scale image-text modality datasets show the effectiveness and efficiency of the proposed SAALDH. And SAALDH achieves better performances by comparing with several state-of-the-art methods. The source code URL of our SAALDH is: http://github.com/SWU-CS-MediaLab/SAALDH.

中文翻译：

用于跨模态检索的自注意力和对抗学习深度哈希网络

多模态信息检索是日常人机交互的主流形式之一。最近的深度跨模态散列方法因其卓越的搜索性能和效率能力而受到越来越多的关注。然而，由于不同模态的异质性，有效探索高级语义相关性和保持表示一致性仍然具有挑战性。在本文中，自注意力和对抗学习哈希网络（SAALDH）被设计用于大规模跨模态检索。具体来说，将跨深度网络不同层的哈希表示进行集成，然后通过采用一种新颖的自注意力机制来增强集成哈希表示中每个位置的重要性。同时，在哈希函数学习过程中，采用对抗性学习机制进一步保持哈希表示的一致性。此外，针对三元组损失设计了一种新的批量半硬选择，以解决 SAALDH 优化过程中的局部最优问题。在两个大规模图像文本模态数据集上评估的实验结果表明了所提出的 SAALDH 的有效性和效率。通过与几种最先进的方法进行比较，SAALDH 获得了更好的性能。我们 SAALDH 的源代码 URL 是：http://github.com/SWU-CS-MediaLab/SAALDH。在两个大规模图像文本模态数据集上评估的实验结果表明了所提出的 SAALDH 的有效性和效率。通过与几种最先进的方法进行比较，SAALDH 获得了更好的性能。我们 SAALDH 的源代码 URL 是：http://github.com/SWU-CS-MediaLab/SAALDH。在两个大规模图像文本模态数据集上评估的实验结果表明了所提出的 SAALDH 的有效性和效率。通过与几种最先进的方法进行比较，SAALDH 获得了更好的性能。我们 SAALDH 的源代码 URL 是：http://github.com/SWU-CS-MediaLab/SAALDH。

更新日期：2021-07-13

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南11