Stacked Pyramid Attention Network for Object Detection,Neural Processing Letters

当前位置： X-MOL 学术 › Neural Process Lett. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Stacked Pyramid Attention Network for Object Detection
Neural Processing Letters ( IF 2.6 ) Pub Date : 2021-04-07 , DOI: 10.1007/s11063-021-10505-x
Shijie Hao , Zhonghao Wang , Fuming Sun

Scale variation is one of the primary challenges in object detection. Recently, different strategies have been introduced to address this challenge, achieving promising performance. However, limitations still exist in these detectors. On the one hand, as for the large-scale deep layers, the localizing power of the features is relatively low. On the other hand, as for the small-scale shallow layers, the categorizing ability of the features is relatively weak. Actually, the limitations are self-solving, as the above two aspects can be mutually beneficial to each other. Therefore, we propose the Stacked Pyramid Attention Network (SPANet) to bridge the gap between different scales. In SPANet, two lightweight modules, i.e. top-down feature map attention module (TDFAM) and bottom-up feature map attention module (BUFAM), are designed. Via learning the channel attention and spatial attention, each module effectively builds connections between features from adjacent scales. By progressively integrating BUFAM and TDFAM into two encoder–decoder structures, two novel feature aggregating branches are built. In this way, the branches fully complement the localizing power from small-scale features and the categorizing power from large-scale features, therefore enhancing the detection accuracy while keeping lightweight. Extensive experiments on two challenging benchmarks (PASCAL VOC and MS COCO datasets) demonstrate the effectiveness of our SPANet, showing that our model reaches a competitive trade-off between accuracy and speed.

中文翻译：

堆叠金字塔注意网络用于目标检测

尺度变化是物体检测的主要挑战之一。最近，已经引入了不同的策略来应对这一挑战，并实现了令人鼓舞的性能。但是，这些检测器仍然存在局限性。一方面，对于大型深层而言，特征的定位能力相对较低。另一方面，对于小规模的浅层，特征的分类能力相对较弱。实际上，这些限制是自解决的，因为上述两个方面可以互惠互利。因此，我们提出了堆叠金字塔注意网络（SPANet），以弥合不同规模之间的差距。在SPANet中，设计了两个轻量级模块，即自上而下的功能图关注模块（TDFAM）和自下而上的功能图关注模块（BUFAM）。通过学习频道注意力和空间注意力，每个模块可以有效地建立相邻比例尺要素之间的联系。通过将BUFAM和TDFAM逐步集成到两个编码器-解码器结构中，可以构建两个新颖的特征聚合分支。这样，分支充分补充了小尺寸特征的定位能力和大特征的分类能力，从而在保持轻量化的同时提高了检测精度。在两个具有挑战性的基准（PASCAL VOC和MS COCO数据集）上进行的大量实验证明了SPANet的有效性，表明我们的模型在准确性和速度之间达成了竞争性的折衷。通过将BUFAM和TDFAM逐步集成到两个编码器-解码器结构中，可以构建两个新颖的特征聚合分支。这样，分支充分补充了小规模特征的定位能力和大特征的分类能力，从而在保持轻量化的同时提高了检测精度。在两个具有挑战性的基准（PASCAL VOC和MS COCO数据集）上进行的大量实验证明了SPANet的有效性，表明我们的模型在准确性和速度之间达成了竞争性的折衷。通过将BUFAM和TDFAM逐步集成到两个编码器-解码器结构中，可以构建两个新颖的特征聚合分支。这样，分支充分补充了小规模特征的定位能力和大特征的分类能力，从而在保持轻量化的同时提高了检测精度。在两个具有挑战性的基准（PASCAL VOC和MS COCO数据集）上进行的大量实验证明了SPANet的有效性，表明我们的模型在准确性和速度之间达成了竞争性的折衷。因此，在保持重量轻的同时提高了检测精度。在两个具有挑战性的基准（PASCAL VOC和MS COCO数据集）上进行的大量实验证明了SPANet的有效性，表明我们的模型在准确性和速度之间达成了竞争性的折衷。因此，在保持重量轻的同时提高了检测精度。在两个具有挑战性的基准（PASCAL VOC和MS COCO数据集）上进行的大量实验证明了SPANet的有效性，表明我们的模型在准确性和速度之间达成了竞争性的折衷。

更新日期：2021-04-08

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南11