Unsupervised Multi-modal Hashing for Cross-Modal Retrieval,Cognitive Computation

当前位置： X-MOL 学术 › Cognit. Comput. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Unsupervised Multi-modal Hashing for Cross-Modal Retrieval
Cognitive Computation ( IF 5.4 ) Pub Date : 2021-03-04 , DOI: 10.1007/s12559-021-09847-4
Jun Yu , Xiao-Jun Wu , Donglin Zhang

The explosive growth of multimedia data on the Internet has magnified the challenge of information retrieval. Multimedia data usually emerges in different modalities, such as image, text, video, and audio. Unsupervised cross-modal hashing techniques that support searching among multi-modal data have gained importance in large-scale retrieval tasks because of the advantage of low storage cost and high efficiency. Current methods learn the hash functions by transforming high-dimensional data into discrete hash codes. However, the original manifold structure and semantic correlation are not preserved well in compact hash codes. We propose a novel unsupervised cross-modal hashing method to cope with this problem from two perspectives. On the one hand, the semantic correlation in textual space and the locally geometric structure in the visual space are reconstructed by unified hashing features seamlessly and simultaneously. On the other hand, the \(\ell _{2,1}\)-norm penalties are imposed on the projection matrices separately to learn the relevant and discriminative hash codes. The experimental results indicate that our proposed method achieves an improvement of 1%, 6%, 9%, and 2% over the best comparison method on the four publicly available datasets (WiKi, PASCAL-VOC, UCI Handwritten Digit, and NUS-WIDE), respectively. In conclusion, the proposed framework which combines hash functions learning and multimodal graph embedding is effective in learning hash codes and achieves superior retrieval performance compared to state-of-the-art methods.

中文翻译：

用于跨模态检索的无监督多模态散列

互联网上多媒体数据的爆炸性增长，加大了信息检索的挑战。多媒体数据通常以不同的方式出现，例如图像，文本，视频和音频。由于存储成本低且效率高的优点，支持在多模式数据中搜索的无监督跨模式哈希技术在大规模检索任务中已变得越来越重要。当前的方法通过将高维数据转换为离散的哈希码来学习哈希函数。但是，原始的流形结构和语义相关性在紧凑的哈希码中无法很好地保留。我们从两个角度提出了一种新颖的无监督跨模态散列方法来解决这个问题。一方面，通过统一的哈希特征无缝而同时地重建了文本空间中的语义相关性和视觉空间中的局部几何结构。另一方面，\（\ ell _ {2,1} \） -分别对投影矩阵施加标准惩罚，以学习相关的和有区别的哈希码。实验结果表明，在四个公开数据集（WiKi，PASCAL-VOC，UCI手写数字和NUS-WIDE）上，我们的方法比最佳比较方法分别提高了1％，6％，9％和2％。），分别。总之，与最新方法相比，将哈希函数学习和多模态图嵌入相结合的所提出的框架在学习哈希码方面是有效的，并且具有出色的检索性能。

更新日期：2021-03-04

点击分享查看原文

点击收藏

公开下载

阅读更多本刊最新论文本刊介绍/投稿指南

全部期刊列表>>