当前位置: X-MOL 学术IEEE Trans. Pattern Anal. Mach. Intell. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Visual and Semantic Knowledge Transfer for Large Scale Semi-Supervised Object Detection
IEEE Transactions on Pattern Analysis and Machine Intelligence ( IF 20.8 ) Pub Date : 2017-11-09 , DOI: 10.1109/tpami.2017.2771779
Yuxing Tang , Josiah Wang , Xiaofang Wang , Boyang Gao , Emmanuel Dellandrea , Robert Gaizauskas , Liming Chen

Deep CNN-based object detection systems have achieved remarkable success on several large-scale object detection benchmarks. However, training such detectors requires a large number of labeled bounding boxes, which are more difficult to obtain than image-level annotations. Previous work addresses this issue by transforming image-level classifiers into object detectors. This is done by modeling the differences between the two on categories with both image-level and bounding box annotations, and transferring this information to convert classifiers to detectors for categories without bounding box annotations. We improve this previous work by incorporating knowledge about object similarities from visual and semantic domains during the transfer process. The intuition behind our proposed method is that visually and semantically similar categories should exhibit more common transferable properties than dissimilar categories, e.g. a better detector would result by transforming the differences between a dog classifier and a dog detector onto the cat class, than would by transforming from the violin class. Experimental results on the challenging ILSVRC2013 detection dataset demonstrate that each of our proposed object similarity based knowledge transfer methods outperforms the baseline methods. We found strong evidence that visual similarity and semantic relatedness are complementary for the task, and when combined notably improve detection, achieving state-of-the-art detection performance in a semi-supervised setting.

中文翻译:


用于大规模半监督目标检测的视觉和语义知识转移



基于深度 CNN 的目标检测系统在多个大规模目标检测基准测试中取得了显着的成功。然而,训练此类检测器需要大量标记的边界框,这比图像级注释更难获得。之前的工作通过将图像级分类器转换为对象检测器来解决这个问题。这是通过使用图像级和边界框注释对两者之间的差异进行建模,并传输此信息以将分类器转换为没有边界框注释的类别的检测器来完成的。我们通过在传输过程中结合来自视觉和语义领域的对象相似性知识来改进之前的工作。我们提出的方法背后的直觉是,视觉上和语义上相似的类别应该比不相似的类别表现出更常见的可转移属性,例如,通过将狗分类器和狗检测器之间的差异转换到猫类上,可以得到更好的检测器,而不是通过转换来自小提琴课。在具有挑战性的 ILSVRC2013 检测数据集上的实验结果表明,我们提出的每种基于对象相似性的知识转移方法都优于基​​线方法。我们发现强有力的证据表明视觉相似性和语义相关性对于该任务是互补的,并且当结合起来显着改善检测时,在半监督环境中实现最先进的检测性能。
更新日期:2017-11-09
down
wechat
bug