当前位置: X-MOL 学术Neurocomputing › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Novel Up-scale Feature Aggregation for Object Detection in Aerial Images
Neurocomputing ( IF 5.5 ) Pub Date : 2020-10-01 , DOI: 10.1016/j.neucom.2020.06.011
Hu Lin , Jingkai Zhou , Yanfen Gan , Chi-Man Vong , Qiong Liu

Abstract Object detection is a pivotal task for many unmanned aerial vehicle (UAV) applications. Compared to general scenes, the objects in aerial images are typically much smaller. For this reason, most general object detectors suffer from two critical challenges while dealing with aerial images: 1) The widely exploited Feature Pyramid Network works by integrating high-level features to lower levels progressively. However, this manner does not transfer equivalent information from each level of backbone network to the generated features, and the shared detection head faces an unbalanced sources of information flow, damaging the detection accuracy. 2) Up-sampling is commonly used to expand feature resolution for feature fusion or feature aggregation. However, existing up-sampling methods are ineffective to reconstruct high resolution feature maps. To address these two challenges, two works are proposed: 1) An up-scale feature aggregation framework that fully utilizes multi-scale complementary information, and 2) a novel up-sampling method that further improve detection accuracy. These two proposals are integrated into an end-to-end single-stage object detector namely HawkNet. Extensive experiments are conducted on VisDrone-DET2018, UAVDT and DIOR datasets. Compared to the RetinaNet baseline, our HawkNet achieves absolute gains of 6.0%, 1.2% and 5.9% in average precision (AP) on VisDrone-DET2018, UAVDT and DIOR datasets, respectively. For a 800 × 1333 input on the UAVDT dataset, HawkNet with ResNet-50 backbone surpasses existing methods for single-scale inference and achieves the best performance (37.4 AP), while operating at 10.6 frames per second on a single Nvidia GTX 1080Ti GPU.



摘要 目标检测是许多无人机 (UAV) 应用的关键任务。与一般场景相比,航拍图像中的物体通常要小得多。出于这个原因,大多数通用对象检测器在处理航拍图像时面临两个关键挑战:1)广泛利用的特征金字塔网络通过逐步将高级特征集成到低级特征来工作。然而,这种方式并没有将等价的信息从每一级骨干网传递到生成的特征,共享检测头面临着信息流不平衡的来源,破坏了检测的准确性。2)上采样通常用于扩展特征分辨率以进行特征融合或特征聚合。然而,现有的上采样方法对于重建高分辨率特征图是无效的。为了解决这两个挑战,提出了两项​​工作:1) 一种充分利用多尺度互补信息的上尺度特征聚合框架,以及 2) 一种进一步提高检测精度的新型上采样方法。这两个提议被集成到一个端到端的单级目标检测器中,即 HawkNet。在 VisDrone-DET2018、UAVDT 和 DIOR 数据集上进行了大量实验。与 RetinaNet 基线相比,我们的 HawkNet 在 VisDrone-DET2018、UAVDT 和 DIOR 数据集上的平均精度 (AP) 分别实现了 6.0%、1.2% 和 5.9% 的绝对增益。对于 UAVDT 数据集上的 800 × 1333 输入,具有 ResNet-50 骨干网的 HawkNet 超越了现有的单尺度推理方法并实现了最佳性能(37.4 AP),同时在单个 Nvidia GTX 1080Ti GPU 上以每秒 10.6 帧的速度运行。