Zero-Shot Object Detection: Joint Recognition and Localization of Novel Concepts,International Journal of Computer Vision

当前位置： X-MOL 学术 › Int. J. Comput. Vis. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Zero-Shot Object Detection: Joint Recognition and Localization of Novel Concepts
International Journal of Computer Vision ( IF 11.6 ) Pub Date : 2020-07-24 , DOI: 10.1007/s11263-020-01355-6
Shafin Rahman , Salman H. Khan , Fatih Porikli

Zero shot learning (ZSL) identifies unseen objects for which no training images are available. Conventional ZSL approaches are restricted to a recognition setting where each test image is categorized into one of several unseen object classes. We posit that this setting is ill-suited for real-world applications where unseen objects appear only as a part of a complete scene, warranting both ‘recognition’ and ‘localization’ of the unseen category. To address this limitation, we introduce a new ‘Zero-Shot Detection’ (ZSD) problem setting, which aims at simultaneously recognizing and locating object instances belonging to novel categories, without any training samples. We introduce an integrated solution to the ZSD problem that jointly models the complex interplay between visual and semantic domain information. Ours is an end-to-end trainable deep network for ZSD that effectively overcomes the noise in the unsupervised semantic descriptions. To this end, we utilize the concept of meta-classes to design an original loss function that achieves synergy between max-margin class separation and semantic domain clustering. In order to set a benchmark for ZSD, we propose an experimental protocol for the large-scale ILSVRC dataset that adheres to practical challenges, e.g., rare classes are more likely to be the unseen ones. Furthermore, we present a baseline approach extended from conventional recognition to the ZSD setting. Our extensive experiments show a significant boost in performance (in terms of mAP and Recall) on the imperative yet difficult ZSD problem on ImageNet detection, MSCOCO and FashionZSD datasets.

中文翻译：

零镜头目标检测：新概念的联合识别和定位

零镜头学习 (ZSL) 可识别没有可用训练图像的看不见的对象。传统的 ZSL 方法仅限于识别设置，其中每个测试图像都被归类为几个看不见的对象类之一。我们假设这种设置不适合现实世界的应用程序，在这些应用程序中，看不见的物体仅作为完整场景的一部分出现，从而保证了对看不见的类别的“识别”和“定位”。为了解决这个限制，我们引入了一个新的“零样本检测”（ZSD）问题设置，其目的是在没有任何训练样本的情况下同时识别和定位属于新类别的对象实例。我们为 ZSD 问题引入了一个集成解决方案，该解决方案联合建模了视觉和语义域信息之间的复杂相互作用。我们是一个端到端可训练的 ZSD 深度网络，它有效地克服了无监督语义描述中的噪音。为此，我们利用元类的概念来设计一个原始的损失函数，以实现最大边距类分离和语义域聚类之间的协同作用。为了为 ZSD 设定基准，我们为大规模 ILSVRC 数据集提出了一个实验协议，该协议坚持实际挑战，例如，稀有类更有可能是看不见的类。此外，我们提出了一种从传统识别扩展到 ZSD 设置的基线方法。我们广泛的实验表明，在 ImageNet 检测、MSCOCO 和 FashionZSD 数据集上的必要但困难的 ZSD 问题上，性能（就 mAP 和召回而言）有了显着提升。

更新日期：2020-07-24

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南11