CrossATNet - a novel cross-attention based framework for sketch-based image retrieval,Image and Vision Computing

当前位置： X-MOL 学术 › Image Vis. Comput. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

CrossATNet - a novel cross-attention based framework for sketch-based image retrieval
Image and Vision Computing ( IF 4.7 ) Pub Date : 2020-08-25 , DOI: 10.1016/j.imavis.2020.104003
Ushasi Chaudhuri , Biplab Banerjee , Avik Bhattacharya , Mihai Datcu

We propose a novel framework for cross-modal zero-shot learning (ZSL) in the context of sketch-based image retrieval (SBIR). Conventionally, the SBIR schema mainly considers simultaneous mappings among the two image views and the semantic side information. Therefore, it is desirable to consider fine-grained classes mainly in the sketch domain using highly discriminative and semantically rich feature space. However, the existing deep generative modeling based SBIR approaches majorly focus on bridging the gaps between the seen and unseen classes by generating pseudo-unseen-class samples. Besides, violating the ZSL protocol by not utilizing any unseen-class information during training, such techniques do not pay explicit attention to modeling the discriminative nature of the shared space. Also, we note that learning a unified feature space for both the multi-view visual data is a tedious task considering the significant domain difference between sketches and the color images. In this respect, as a remedy, we introduce a novel framework for zero-shot SBIR. While we define a cross-modal triplet loss to ensure the discriminative nature of the shared space, an innovative cross-modal attention learning strategy is also proposed to guide feature extraction from the image domain exploiting information from the respective sketch counterpart. In order to preserve the semantic consistency of the shared space, we consider a graph CNN based module which propagates the semantic class topology to the shared space. To ensure an improved response time during inference, we further explore the possibility of representing the shared space in terms of hash-codes. Experimental results obtained on the benchmark TU-Berlin and the Sketchy datasets confirm the superiority of CrossATNet in yielding the state-of-the-art results.

中文翻译：

CrossATNet-一种新颖的基于交叉注意的框架，用于基于草图的图像检索

在基于草图的图像检索（SBIR）的背景下，我们提出了一种用于跨模式零击学习（ZSL）的新颖框架。传统上，SBIR模式主要考虑两个图像视图之间的同时映射以及语义辅助信息。因此，希望主要使用高度区分性和语义丰富的特征空间在草图域中考虑细粒度的类。但是，现有的基于SBIR的深度生成建模方法主要集中在通过生成伪看不见类样本来弥合可见和不可见类之间的差距。此外，由于在训练过程中没有利用任何看不见的类信息而违反了ZSL协议，因此这些技术没有明确注意对共享空间的判别性质进行建模。也，我们注意到，考虑到草图和彩色图像之间的显着领域差异，为多视图视觉数据学习统一的特征空间是一项繁琐的任务。在这方面，作为一种补救措施，我们介绍了一种用于零脉冲SBIR的新颖框架。虽然我们定义了跨模态三元组损失以确保共享空间的区分性，但还提出了创新的跨模态注意力学习策略，以利用来自各个草图对应对象的信息来指导从图像域中提取特征。为了保留共享空间的语义一致性，我们考虑基于图CNN的模块，该模块将语义类拓扑传播到共享空间。为了确保在推理过程中缩短响应时间，我们进一步探讨了用哈希码表示共享空间的可能性。在基准TU-Berlin和Sketchy数据集上获得的实验结果证实了CrossATNet在产生最新结果方面的优越性。

更新日期：2020-08-25

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南

全部期刊列表>>