当前位置: X-MOL 学术Pattern Anal. Applic. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
An approach to improve SSD through mask prediction of multi-scale feature maps
Pattern Analysis and Applications ( IF 3.9 ) Pub Date : 2021-05-15 , DOI: 10.1007/s10044-021-00993-x
Peng Sun , Yaqin Zhao , Songhao Zhu

We propose a novel single shot object detection network with a mask prediction branch. Our motivation is to enhance object detection features with semantic information extracted from deeper layers. The proposed mask prediction branch enriches important features in shallower layers with pixel-wise probability distribution of semantic information. Meanwhile, an improved receptive field block is adopted to increase the scale of receptive field of backbone network without too much extra computing burden. Our network improves the performance significantly over SSD and FSSD (Feature Fusion Single Shot Multi-box Detector) with just a little speed drop. In addition, we discuss the relationship between effective receptive fields and theoretical receptive fields on VGG16 backbone network. Comprehensive experimental results on PASCAL VOC 2007 demonstrate the effectiveness of the proposed method. We achieve a mAP of 79.8 with 300 × 300 input images (81.2 mAP by 512 × 512 inputs) at the speed of 58.4 FPS on a single Nvidia 1080Ti GPU. Experimental results demonstrate that the proposed network achieves a comparable performance with the state-of-the-arts.



中文翻译:

一种通过多尺度特征图的掩模预测来改进SSD的方法

我们提出了一种新颖的具有掩膜预测分支的单发物体检测网络。我们的动机是利用从更深层提取的语义信息来增强对象检测功能。拟议的掩码预测分支通过语义信息的像素级概率分布,丰富了较浅层中的重要功能。同时,采用改进的接收域块来增加骨干网的接收域规模,而没有太多额外的计算负担。我们的网络与SSD和FSSD(功能融合单发多盒检测器)相比,性能有了显着提高,而速度却有所下降。此外,我们还讨论了VGG16骨干网络上有效接受域与理论接受域之间的关系。PASCAL VOC 2007上的综合实验结果证明了该方法的有效性。我们在单个Nvidia 1080Ti GPU上以58.4 FPS的速度实现了300×300输入图像(81.2 mAP x 512×512输入)的79.8的mAP。实验结果表明,所提出的网络可以实现与最新技术相当的性能。

更新日期:2021-05-15
down
wechat
bug