当前位置: X-MOL 学术IEEE Trans. Image Process. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Graph Regularized Flow Attention Network for Video Animal Counting From Drones
IEEE Transactions on Image Processing ( IF 10.8 ) Pub Date : 2021-05-28 , DOI: 10.1109/tip.2021.3082297
Pengfei Zhu , Tao Peng , Dawei Du , Hongtao Yu , Libo Zhang , Qinghua Hu

In this paper, we propose a large-scale video based animal counting dataset collected by drones (AnimalDrone) for agriculture and wildlife protection. The dataset consists of two subsets, i.e., PartA captured on site by drones and PartB collected from the Internet, with rich annotations of more than 4 million objects in 53, 644 frames and corresponding attributes in terms of density, altitude and view. Moreover, we develop a new graph regularized flow attention network (GFAN) to perform density map estimation in dense crowds of video clips with arbitrary crowd density, perspective, and flight altitude. Specifically, our GFAN method leverages optical flow to warp the multi-scale feature maps in sequential frames to exploit the temporal relations, and then combines the enhanced features to predict the density maps. Moreover, we introduce the multi-granularity loss function including pixel-wise density loss and region-wise count loss to enforce the network to concentrate on discriminative features for different scales of objects. Meanwhile, the graph regularizer is imposed on the density maps of multiple consecutive frames to maintain temporal coherency. Extensive experiments are conducted to demonstrate the effectiveness of the proposed method, compared with several state-of-the-art counting algorithms. The AnimalDrone dataset is available at https://github.com/VisDrone/AnimalDrone.

中文翻译:


用于无人机视频动物计数的图正则化流注意网络



在本文中,我们提出了一个由无人机(AnimalDrone)收集的基于视频的大规模动物计数数据集,用于农业和野生动物保护。该数据集由两个子集组成,即无人机现场捕获的 PartA 和从互联网收集的 PartB,对 53、644 帧中超过 400 万个物体进行了丰富的注释,并在密度、高度和视图方面提供了相应的属性。此外,我们开发了一种新的图正则化流注意网络(GFAN),可以在具有任意人群密度、视角和飞行高度的密集视频剪辑人群中执行密度图估计。具体来说,我们的 GFAN 方法利用光流来扭曲连续帧中的多尺度特征图以利用时间关系,然后结合增强的特征来预测密度图。此外,我们引入了多粒度损失函数,包括像素级密度损失和区域级计数损失,以强制网络专注于不同尺度物体的判别特征。同时,对多个连续帧的密度图施加图正则化器以保持时间一致性。与几种最先进的计数算法相比,进行了大量的实验来证明该方法的有效性。 AnimalDrone 数据集可在 https://github.com/VisDrone/AnimalDrone 获取。
更新日期:2021-05-28
down
wechat
bug