当前位置: X-MOL 学术IEEE Trans. Image Process. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Progressive Cross-Modal Semantic Network for Zero-Shot Sketch-Based Image Retrieval.
IEEE Transactions on Image Processing ( IF 10.6 ) Pub Date : 2020-09-10 , DOI: 10.1109/tip.2020.3020383
Cheng Deng , Xinxun Xu , Hao Wang , Muli Yang , Dacheng Tao

Zero-shot sketch-based image retrieval (ZS-SBIR) is a specific cross-modal retrieval task that involves searching natural images through the use of free-hand sketches under the zero-shot scenario. Most previous methods project the sketch and image features into a low-dimensional common space for efficient retrieval, and meantime align the projected features to their semantic features (e.g., category-level word vectors) in order to transfer knowledge from seen to unseen classes. However, the projection and alignment are always coupled; as a result, there is a lack of explicit alignment that consequently leads to unsatisfactory zero-shot retrieval performance. To address this issue, we propose a novel progressive cross-modal semantic network. More specifically, it first explicitly aligns the sketch and image features to semantic features, then projects the aligned features to a common space for subsequent retrieval. We further employ cross-reconstruction loss to encourage the aligned features to capture complete knowledge about the two modalities, along with multi-modal Euclidean loss that guarantees similarity between the retrieval features from a sketch-image pair. Extensive experiments conducted on two popular large-scale datasets demonstrate that our proposed approach outperforms state-of-the-art competitors to a remarkable extent: by more than 3% on the Sketchy dataset and about 6% on the TU-Berlin dataset in terms of retrieval accuracy.

中文翻译:

渐进式跨模态语义网络,用于基于零位草图的图像检索。

基于零镜头的草图图像检索(ZS-SBIR)是一种特殊的交叉模式检索任务,涉及在零镜头的情况下通过使用徒手草图来搜索自然图像。先前的大多数方法将草图和图像特征投影到低维公共空间中以进行有效检索,同时将投影的特征与其语义特征(例如类别级别的词向量)对齐,以将知识从可见的类转移到看不见的类。但是,投影和对齐始终是耦合的。结果,缺少明确的对齐方式,从而导致令人满意的零镜头检索性能。为了解决这个问题,我们提出了一种新颖的渐进式跨模态语义网络。更具体地说,它首先将草图和图像特征与语义特征明确对齐,然后将对齐的特征投影到公共空间以进行后续检索。我们进一步采用交叉重构损失来鼓励对齐的特征获取关于两种模态的完整知识,以及多模态欧几里得损失,以保证从草图对中检索特征之间的相似性。在两个流行的大型数据集上进行的大量实验表明,我们提出的方法在很大程度上优于最新的竞争对手:就Sketchy数据集而言超过3%,而在TU-Berlin数据集方面则约为6%。检索精度。以及多模态欧几里得损失,该损失保证了草图对中检索特征之间的相似性。在两个流行的大型数据集上进行的大量实验表明,我们提出的方法在很大程度上优于最新的竞争对手:就Sketchy数据集而言超过3%,而在TU-Berlin数据集方面则约为6%。检索精度。以及多模态欧几里得损失,该损失保证了草图对中检索特征之间的相似性。在两个流行的大型数据集上进行的大量实验表明,我们提出的方法在很大程度上优于最新的竞争对手:就Sketchy数据集而言超过3%,而在TU-Berlin数据集方面则约为6%。检索精度。
更新日期:2020-09-22
down
wechat
bug