Semantically Tied Paired Cycle Consistency for Any-Shot Sketch-Based Image Retrieval,International Journal of Computer Vision

当前位置： X-MOL 学术 › Int. J. Comput. Vis. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Semantically Tied Paired Cycle Consistency for Any-Shot Sketch-Based Image Retrieval
International Journal of Computer Vision ( IF 11.6 ) Pub Date : 2020-07-29 , DOI: 10.1007/s11263-020-01350-x
Anjan Dutta , Zeynep Akata

Low-shot sketch-based image retrieval is an emerging task in computer vision, allowing to retrieve natural images relevant to hand-drawn sketch queries that are rarely seen during the training phase. Related prior works either require aligned sketch-image pairs that are costly to obtain or inefficient memory fusion layer for mapping the visual information to a semantic space. In this paper, we address any-shot, i.e. zero-shot and few-shot, sketch-based image retrieval (SBIR) tasks, where we introduce the few-shot setting for SBIR. For solving these tasks, we propose a semantically aligned paired cycle-consistent generative adversarial network (SEM-PCYC) for any-shot SBIR, where each branch of the generative adversarial network maps the visual information from sketch and image to a common semantic space via adversarial training. Each of these branches maintains cycle consistency that only requires supervision at the category level, and avoids the need of aligned sketch-image pairs. A classification criteria on the generators’ outputs ensures the visual to semantic space mapping to be class-specific. Furthermore, we propose to combine textual and hierarchical side information via an auto-encoder that selects discriminating side information within a same end-to-end model. Our results demonstrate a significant boost in any-shot SBIR performance over the state-of-the-art on the extended version of the challenging Sketchy, TU-Berlin and QuickDraw datasets.

中文翻译：

基于任意镜头草图的图像检索的语义绑定成对循环一致性

基于小镜头草图的图像检索是计算机视觉中的一项新兴任务，它允许检索与手绘草图查询相关的自然图像，这些图像在训练阶段很少见。相关的先前工作要么需要获得成本高昂的对齐的草图图像对，要么需要低效的内存融合层来将视觉信息映射到语义空间。在本文中，我们解决了任意镜头，即零镜头和小镜头，基于草图的图像检索 (SBIR) 任务，其中我们介绍了 SBIR 的小镜头设置。为了解决这些任务，我们为 any-shot SBIR 提出了语义对齐的成对循环一致生成对抗网络 (SEM-PCYC)，其中生成对抗网络的每个分支都将视觉信息从草图和图像映射到公共语义空间：对抗训练。这些分支中的每一个都保持循环一致性，只需要在类别级别进行监督，并避免需要对齐的草图-图像对。生成器输出的分类标准确保视觉到语义空间映射是特定于类的。此外，我们建议通过自动编码器将文本和分层辅助信息结合起来，该自动编码器在同一端到端模型中选择有区别的辅助信息。我们的结果表明，在具有挑战性的 Sketchy、TU-Berlin 和 QuickDraw 数据集的扩展版本上，任何镜头 SBIR 的性能都比最新技术显着提升。生成器输出的分类标准确保视觉到语义空间映射是特定于类的。此外，我们建议通过自动编码器将文本和分层辅助信息结合起来，该自动编码器在同一端到端模型中选择有区别的辅助信息。我们的结果表明，在具有挑战性的 Sketchy、TU-Berlin 和 QuickDraw 数据集的扩展版本上，任何镜头 SBIR 的性能都比最新技术显着提升。生成器输出的分类标准确保视觉到语义空间映射是特定于类的。此外，我们建议通过自动编码器将文本和分层辅助信息结合起来，该自动编码器在同一端到端模型中选择有区别的辅助信息。我们的结果表明，在具有挑战性的 Sketchy、TU-Berlin 和 QuickDraw 数据集的扩展版本上，任何镜头 SBIR 的性能都比最新技术显着提升。

更新日期：2020-07-29

点击分享查看原文

点击收藏

公开下载

阅读更多本刊最新论文本刊介绍/投稿指南11