当前位置: X-MOL 学术Mobile Netw. Appl. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
RSANet: Towards Real-Time Object Detection with Residual Semantic-Guided Attention Feature Pyramid Network
Mobile Networks and Applications ( IF 3.8 ) Pub Date : 2021-01-04 , DOI: 10.1007/s11036-020-01723-z
Quan Zhou , Jie Wang , Jia Liu , Shenghua Li , Weihua Ou , Xin Jin

The huge computational overhead limits the inference of convolutional neural networks on mobile devices for object detection, which plays a critical role in many real-world scenes, such as face identification, autonomous driving, and video surveillance. To solve this problem, this paper introduces a lightweight convolutional neural network, called RSANet: Towards Real-time Object Detection with Residual Semantic-guided Attention Feature Pyramid Network. Our RSANet consists of two parts: (a) Lightweight Convolutional Network (LCNet) as backbone, and (b) Residual Semantic-guided Attention Feature Pyramid Network (RSAFPN) as detection head. In the LCNet, in contrast to recent advances of lightweight networks that prefer to utilize pointwise convolution for changing the number of feature maps, we design a Constant Channel Module (CCM) to save the Memory Access Cost (MAC) and design Down Sampling Module (DSM) to save the computational cost. In the RSAFPN, meanwhile, we employ Residual Semantic-guided Attention Mechanism (RSAM) to fuse the multi-scale features from LCNet for improving detection performance efficiently. The experiment results show that, on PASCAL VOC 20007 dataset, RSANet only requires 3.24 M model size and needs only 3.54B FLOPs with a 416×416 input image. Compared to YOLO Nano, our method obtains a 6.7% improvement in accuracy and requires less computation. On MS COCO dataset, RSANet only requires 4.35 M model size and needs only 2.34B FLOPs with a 320×320 input image. Our method obtains a 1.3% improvement in accuracy compared to Pelee. The comprehensive experiment results demonstrate that our model achieves promising results in terms of available speed and accuracy trade-off.



中文翻译:

RSANet:通过残余语义引导的注意力特征金字塔网络实现实时目标检测

巨大的计算开销限制了在移动设备上进行卷积神经网络进行对象检测的推理,而卷积神经网络在许多现实世界场景中扮演着至关重要的角色,例如人脸识别,自动驾驶和视频监控。为了解决这个问题,本文介绍了一种轻量级的卷积神经网络,称为RSANet:利用残留语义引导的注意力特征金字塔网络实现实时目标检测。我们的RSANet由两部分组成:(a)轻型卷积网络(LCNet)作为主干,以及(b)残留语义引导的注意力特征金字塔网络(RSAFPN)作为检测头。在LCNet中,与轻型网络的最新发展形成鲜明对比,轻型网络更喜欢利用逐点卷积来更改特征图的数量,我们设计了一个恒定通道模块(CCM)以节省内存访问成本(MAC),并设计下采样模块(DSM)以节省计算成本。同时,在RSAFPN中,我们采用了残留语义引导的注意机制(RSAM)融合了LCNet的多尺度特征,从而有效地提高了检测性能。实验结果表明,在PASCAL VOC 20007数据集上,RSANet仅需要3.24 M的模型大小,并且仅需要具有416×416输入图像的3.54B FLOP。与YOLO Nano相比,我们的方法的准确性提高了6.7%,所需的计算量更少。在MS COCO数据集上,RSANet仅需要4.35 M的模型大小,并且仅需要2.34B FLOP(具有320×320输入图像)。与Pelee相比,我们的方法的准确度提高了1.3%。

更新日期:2021-01-04
down
wechat
bug