Bi-directional skip connection feature pyramid network and sub-pixel convolution for high-quality object detection,Neurocomputing

当前位置： X-MOL 学术 › Neurocomputing › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Bi-directional skip connection feature pyramid network and sub-pixel convolution for high-quality object detection
Neurocomputing ( IF 5.5 ) Pub Date : 2021-01-13 , DOI: 10.1016/j.neucom.2021.01.021
Shuqi Xiong , Xiaohong Wu , Honggang Chen , Linbo Qing , Tong Chen , Xiaohai He

In existing state-of-the-art object detectors, feature pyramid networks (FPN) and multiscale feature fusion are still typically used. The traditional FPN fusion strategy is based on the top-down fusion of high-level semantic information. The top-down fusion method generally uses upsampling based on interpolation, which often results in jagged edges, mosaic distortion, and edge blurring. Moreover, in order to improve accuracy, the FPN-based fusion strategy must add multiple top-down components for fusion, which increases computational costs and leads to a poor balance between precision and speed. In this paper, we propose a novel fusion strategy based on a backbone network. We aim to design simple and efficient components for high-quality object detection. Our proposed strategy, bi-directional skip connection FPN (BiSCFPN), consists of three components: a bi-directional skip connection (BiSC), a selective dilated convolution module (SDCM), and sub-pixel convolution (SP). The BiSC aims to enhance semantic information between different feature layers in the backbone network and simultaneously uses the SDCM to improve the receptive fields of differently sized targets in the fusion stage. Finally, SP learns the relationship between the features of upsampling and downsampling images to effectively mitigate the problems caused by the traditional interpolation method. BiSCFPN achieves an average precision of 38.2% in tests with the Microsoft Common Objects in Context (MS COCO) test-dev dataset at a real-time speed of $~$ 50 FPS $(608 \times 608)$ using an Nvidia GeForce RTX 2080 Ti graphics card and significantly improves the balance between precision and speed.

中文翻译：

双向跳过连接功能金字塔网络和亚像素卷积，可实现高质量的目标检测

在现有的最新对象检测器中，通常仍使用特征金字塔网络（FPN）和多尺度特征融合。传统的FPN融合策略基于高级语义信息的自上而下的融合。自上而下的融合方法通常使用基于插值的上采样，这通常会导致锯齿状边缘，马赛克失真和边缘模糊。此外，为了提高准确性，基于FPN的融合策略必须添加多个自上而下的融合组件，这增加了计算成本，并导致精度和速度之间的平衡不佳。在本文中，我们提出了一种基于骨干网的新颖融合策略。我们旨在设计简单有效的组件，以进行高质量的物体检测。我们提出的策略，双向跳过连接FPN（BiSCFPN），由三部分组成：双向跳过连接（BiSC），选择性膨胀卷积模块（SDCM）和子像素卷积（SP）。BiSC旨在增强骨干网络中不同特征层之间的语义信息，同时在融合阶段使用SDCM来改进大小不同的目标的接受域。最后，SP学习了上采样和下采样图像特征之间的关系，以有效缓解传统插值方法带来的问题。BiSCFPN在使用Microsoft Context in Common Objects（MS COCO）test-dev数据集进行的测试中，实时速度达到了38.2％的平均精度。BiSC旨在增强骨干网络中不同特征层之间的语义信息，同时在融合阶段使用SDCM来改进大小不同的目标的接受域。最后，SP学习了上采样和下采样图像特征之间的关系，以有效缓解传统插值方法带来的问题。BiSCFPN在使用Microsoft Context in Common Objects（MS COCO）test-dev数据集进行的测试中，实时速度达到了38.2％的平均精度。BiSC旨在增强骨干网络中不同特征层之间的语义信息，同时在融合阶段使用SDCM来改进大小不同的目标的接受域。最后，SP学习了上采样和下采样图像特征之间的关系，以有效缓解传统插值方法带来的问题。BiSCFPN在使用Microsoft Context in Common Objects（MS COCO）test-dev数据集进行的测试中，实时速度达到了38.2％的平均精度。SP通过学习上采样和下采样图像之间的关系来有效缓解传统插值方法带来的问题。在使用Microsoft Context in Common Objects（MS COCO）test-dev数据集进行的测试中，BiSCFPN的实时平均速度达到38.2％。SP通过学习上采样和下采样图像之间的关系来有效缓解传统插值方法带来的问题。BiSCFPN在使用Microsoft Context in Common Objects（MS COCO）test-dev数据集进行的测试中，实时速度达到了38.2％的平均精度。 $〜$ 50 FPS $（ 608 \times 608 ）$ 使用Nvidia GeForce RTX 2080 Ti图形卡，可以显着提高精度和速度之间的平衡。

更新日期：2021-03-04

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南11