当前位置: X-MOL 学术Pattern Recogn. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Multi-Scale Structural Kernel Representation for Object Detection
Pattern Recognition ( IF 7.5 ) Pub Date : 2021-02-01 , DOI: 10.1016/j.patcog.2020.107593
Hao Wang , Qilong Wang , Peihua Li , Wangmeng Zuo

Abstract Existing high-performance object detection methods greatly benefit from the powerful representation ability of deep convolutional neural networks (CNNs). Recent researches show that integration of high-order statistics remarkably improves the representation ability of deep CNNs. However, high-order statistics for object detection lie in two challenges. Firstly, previous methods insert high-order statistics into deep CNNs as global representations, which lose spatial information of inputs, and so are not applicable to object detection. Furthermore, high-order statistics have special structures, which should be considered for proper use of high-order statistics. To overcome above challenges, this paper proposes a Multi-scale Structural Kernel Representation (MSKR) for improving performance of object detection. Our MSKR is developed based on the polynomial kernel approximation, which does not only draw into high-order statistics but also preserve the spatial information of input. To consider geometry structures of high-order representations, a feature power normalization method is introduced before computation of kernel representation. Comparing with the most commonly used first-order statistics in existing CNN-based detectors, our MSKR can generate more discriminative representations, and so be flexibly integrated into deep CNNs for improving performance of object detection. By adopting the proposed MSKR to existing object detection methods (i.e., Faster R-CNN, FPN, Mask R-CNN and RetinaNet), it achieves clear improvement on three widely used benchmarks, while obtaining very competitive performance with state-of-the-art methods.

中文翻译:

用于对象检测的多尺度结构内核表示

摘要 现有的高性能目标检测方法极大地受益于深度卷积神经网络 (CNN) 强大的表示能力。最近的研究表明,高阶统计量的整合显着提高了深度 CNN 的表示能力。然而,对象检测的高阶统计数据存在两个挑战。首先,以前的方法将高阶统计数据作为全局表示插入到深度 CNN 中,这会丢失输入的空间信息,因此不适用于目标检测。此外,高阶统计具有特殊的结构,要正确使用高阶统计,应考虑到这一点。为了克服上述挑战,本文提出了一种多尺度结构核表示(MSKR)来提高目标检测的性能。我们的 MSKR 是基于多项式核近似开发的,它不仅可以引入高阶统计数据,还可以保留输入的空间信息。为了考虑高阶表示的几何结构,在计算核表示之前引入了特征幂归一化方法。与现有的基于 CNN 的检测器中最常用的一阶统计数据相比,我们的 MSKR 可以生成更具辨别力的表示,因此可以灵活地集成到深度 CNN 中以提高对象检测的性能。通过将提出的 MSKR 应用于现有的对象检测方法(即 Faster R-CNN、FPN、Mask R-CNN 和 RetinaNet),它在三个广泛使用的基准测试中实现了明显的改进,同时获得了非常有竞争力的性能艺术方法。
更新日期:2021-02-01
down
wechat
bug