Zero Shot Detection,IEEE Transactions on Circuits and Systems for Video Technology

当前位置： X-MOL 学术 › IEEE Trans. Circ. Syst. Video Technol. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Zero Shot Detection
IEEE Transactions on Circuits and Systems for Video Technology ( IF 8.3 ) Pub Date : 2020-04-01 , DOI: 10.1109/tcsvt.2019.2899569
Pengkai Zhu , Hanxiao Wang , Venkatesh Saligrama

As we move toward large-scale object detection, it is unrealistic to expect annotated training data, in the form of bounding box annotations around objects, for all object classes at sufficient scale; therefore, the methods capable of unseen object detection are required. We propose a novel zero-shot method based on training an end-to-end model that fuses semantic attribute prediction with visual features to propose object bounding boxes for seen and unseen classes. While we utilize semantic features during training, our method is agnostic to semantic information for unseen classes at test-time. Our method retains the efficiency and effectiveness of YOLOv2 for objects seen during training, while improving its performance for novel and unseen objects. The ability of the state-of-the-art detection methods to learn discriminative object features to reject background proposals also limits their performance for unseen objects. We posit that, to detect unseen objects, we must incorporate semantic information into the visual domain so that the learned visual features reflect this information and lead to improved recall rates for unseen objects. We test our method on PASCAL VOC and MS COCO dataset and observed significant improvements on the average precision of unseen classes.

中文翻译：

零射击检测

随着我们向大规模对象检测迈进，期望以对象周围的边界框注释的形式对足够规模的所有对象类进行带注释的训练数据是不现实的；因此，需要能够检测看不见的物体的方法。我们提出了一种基于训练端到端模型的新型零样本方法，该模型将语义属性预测与视觉特征相结合，为可见和不可见类提出对象边界框。虽然我们在训练期间利用语义特征，但我们的方法在测试时对看不见的类的语义信息是不可知的。我们的方法保留了 YOLOv2 对训练期间看到的对象的效率和有效性，同时提高了其对新的和不可见的对象的性能。最先进的检测方法学习判别对象特征以拒绝背景提议的能力也限制了它们对看不见的对象的性能。我们假设，为了检测看不见的物体，我们必须将语义信息纳入视觉域，以便学习的视觉特征反映这些信息，并提高对看不见的物体的召回率。我们在 PASCAL VOC 和 MS COCO 数据集上测试了我们的方法，并观察到了未见过类的平均精度的显着提高。

更新日期：2020-04-01

点击分享查看原文

点击收藏

公开下载

阅读更多本刊最新论文本刊介绍/投稿指南11