当前位置: X-MOL 学术IEEE Trans. Image Process. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Progressive Cross-Modal Semantic Network for Zero-Shot Sketch-Based Image Retrieval.
IEEE Transactions on Image Processing ( IF 10.8 ) Pub Date : 2020-09-10 , DOI: 10.1109/tip.2020.3020383
Cheng Deng , Xinxun Xu , Hao Wang , Muli Yang , Dacheng Tao

Zero-shot sketch-based image retrieval (ZS-SBIR) is a specific cross-modal retrieval task that involves searching natural images through the use of free-hand sketches under the zero-shot scenario. Most previous methods project the sketch and image features into a low-dimensional common space for efficient retrieval, and meantime align the projected features to their semantic features (e.g., category-level word vectors) in order to transfer knowledge from seen to unseen classes. However, the projection and alignment are always coupled; as a result, there is a lack of explicit alignment that consequently leads to unsatisfactory zero-shot retrieval performance. To address this issue, we propose a novel progressive cross-modal semantic network. More specifically, it first explicitly aligns the sketch and image features to semantic features, then projects the aligned features to a common space for subsequent retrieval. We further employ cross-reconstruction loss to encourage the aligned features to capture complete knowledge about the two modalities, along with multi-modal Euclidean loss that guarantees similarity between the retrieval features from a sketch-image pair. Extensive experiments conducted on two popular large-scale datasets demonstrate that our proposed approach outperforms state-of-the-art competitors to a remarkable extent: by more than 3% on the Sketchy dataset and about 6% on the TU-Berlin dataset in terms of retrieval accuracy.

中文翻译:


用于基于零镜头草图的图像检索的渐进式跨模态语义网络。



基于零样本草图的图像检索(ZS-SBIR)是一种特定的跨模态检索任务,涉及在零样本场景下通过使用手绘草图来搜索自然图像。大多数先前的方法将草图和图像特征投影到低维公共空间中以进行有效检索,同时将投影特征与其语义特征(例如,类别级词向量)对齐,以便将知识从可见类转移到不可见类。然而,投影和对齐总是耦合的;结果,缺乏明确的对齐,从而导致零样本检索性能不令人满意。为了解决这个问题,我们提出了一种新颖的渐进式跨模态语义网络。更具体地说,它首先将草图和图像特征与语义特征显式对齐,然后将对齐的特征投影到公共空间以供后续检索。我们进一步采用交叉重建损失来鼓励对齐的特征捕获有关两种模态的完整知识,以及多模态欧几里德损失来保证草图图像对的检索特征之间的相似性。在两个流行的大规模数据集上进行的大量实验表明,我们提出的方法在很大程度上优于最先进的竞争对手:在 Sketchy 数据集上超过 3%,在 TU-Berlin 数据集上超过 6%的检索精度。
更新日期:2020-09-22
down
wechat
bug