Two-stream small-scale pedestrian detection network with feature aggregation for drone-view videos,Multidimensional Systems and Signal Processing

当前位置： X-MOL 学术 › Multidimens. Syst. Signal Process. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Two-stream small-scale pedestrian detection network with feature aggregation for drone-view videos
Multidimensional Systems and Signal Processing ( IF 1.7 ) Pub Date : 2021-02-08 , DOI: 10.1007/s11045-021-00764-1
Han Xie , Hyunchul Shin

Detecting small-scale pedestrians in aerial images is a challenging task that can be difficult even for humans. Observing that the single image based method cannot achieve robust performance because of the poor visual cues of small instances. Considering that multiple frames may provide more information to detect such difficult case instead of only single frame, we design a novel video based pedestrian detection method with a two-stream network pipeline to fully utilize the temporal and contextual information of a video. An aggregated feature map is proposed to absorb the spatial and temporal information with the help of spatial and temporal sub-networks. To better capture motion information, a more refined flow net (SPyNet) is adopted instead of a simple flownet. In the spatial stream subnetwork, we modified the backbone network structure by increasing the feature map resolution with relatively larger receptive field to make it suitable for small-scale detection. Experimental results based on drone video datasets demonstrate that our approach improves detection accuracy in the case of small-scale instances and reduces false positive detections. By exploiting the temporal information and aggregating the feature maps, our two-stream method improves the detection performance by 8.48% in mean Average Precision (mAP) from that of the basic single stream R-FCN method, and it outperforms the state-of-the-art method by 3.09% on the Okutama Human-action dataset.

中文翻译：

具有特征聚合功能的两路小规模行人检测网络，用于无人机观看视频

在空中图像中检测小规模行人是一项艰巨的任务，即使对于人类来说也可能是困难的。观察到由于小实例的视觉提示较差，基于单个图像的方法无法实现鲁棒的性能。考虑到多个帧可能会提供更多信息来检测这种困难情况，而不仅仅是单个帧，因此，我们设计了一种新颖的基于视频的行人检测方法，该方法具有两个流网络管线，以充分利用视频的时间和上下文信息。提出了一种聚合的特征图，以借助时空子网吸收时空信息。为了更好地捕获运动信息，采用了更完善的流网（SPyNet）代替了简单的流网。在空间流子网中，我们通过增加具有较大接收区域的特征图分辨率来修改骨干网络结构，使其适合于小规模检测。基于无人机视频数据集的实验结果表明，在小规模情况下，我们的方法可以提高检测精度，并减少误报检测。通过利用时间信息并汇总特征图，我们的两流方法比基本单流R-FCN方法的平均平均精度（mAP）提高了8.48％的检测性能，并且其性能优于在Okutama人为动作数据集上采用了最先进的方法3.09％。基于无人机视频数据集的实验结果表明，在小规模情况下，我们的方法可以提高检测精度，并减少误报检测。通过利用时间信息并汇总特征图，我们的两流方法比基本单流R-FCN方法的平均平均精度（mAP）提高了8.48％的检测性能，并且其性能优于在Okutama人为动作数据集上采用了最先进的方法3.09％。基于无人机视频数据集的实验结果表明，在小规模情况下，我们的方法可以提高检测精度，并减少误报检测。通过利用时间信息并汇总特征图，我们的两流方法比基本单流R-FCN方法的平均平均精度（mAP）提高了8.48％的检测性能，并且其性能优于在Okutama人为动作数据集上采用了最先进的方法3.09％。

更新日期：2021-02-09

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南11