Multiscale Multitask Deep NetVLAD for Crowd Counting,IEEE Transactions on Industrial Informatics

当前位置： X-MOL 学术 › IEEE Trans. Ind. Inform. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Multiscale Multitask Deep NetVLAD for Crowd Counting
IEEE Transactions on Industrial Informatics ( IF 12.3 ) Pub Date : 2018-11-01 , DOI: 10.1109/tii.2018.2852481
Zenglin Shi , Le Zhang , Yibo Sun , Yangdong Ye

Deep convolutional networks (CNNs) reign undisputed as the new de-facto method for computer vision tasks owning to their success in visual recognition task on still images. However, their adaptations to crowd counting have not clearly established their superiority over shallow models. Existing CNNs turn out to be self-limiting in challenging scenarios such as camera illumination changing, partial occlusions, diverse crowd distributions, and perspective distortions for crowd counting because of their shallow structure. In this paper, we introduce a dynamic augmentation technique to train a much deeper CNN for crowd counting. In order to decrease overfitting caused by limited number of training samples, multitask learning is further employed to learn generalizable representations across similar domains. We also propose to aggregate multiscale convolutional features extracted from the entire image into a compact single vector representation amenable to efficient and accurate counting by way of “Vector of Locally Aggregated Descriptors” (VLAD). The “deeply supervised” strategy is employed to provide additional supervision signal for bottom layers for further performance improvement. Experimental results on three benchmark crowd datasets show that our method achieves better performance than the existing methods. Our implementation will be released at https://github.com/shizenglin/Multitask-Multiscale-Deep-NetVLAD.

中文翻译：

用于人群计数的多尺度多任务深度NetVLAD

深度卷积网络（CNN）毫无争议地被视为计算机视觉任务的事实上的新方法，这要归功于其在静态图像上的视觉识别任务中的成功。但是，他们对人群计数的适应性并未明确确立其相对于浅层模型的优越性。事实证明，现有的CNN在具有挑战性的场景中具有自限性，例如摄像机照明改变，部分遮挡，人群分布多样以及由于人群浅而造成的人群计数透视失真。在本文中，我们介绍了一种动态增强技术，以训练更深的CNN进行人群计数。为了减少由有限数量的训练样本引起的过度拟合，进一步采用多任务学习来学习跨相似域的可概括表示。我们还建议通过“局部聚合描述符向量”（VLAD）将从整个图像中提取的多尺度卷积特征聚合为一个紧凑的单个向量表示形式，以实现高效，准确的计数。“深度监控”策略用于为底层提供额外的监控信号，以进一步提高性能。在三个基准人群数据集上的实验结果表明，我们的方法比现有方法具有更好的性能。我们的实现将在https://github.com/shizenglin/Multitask-Multiscale-Deep-NetVLAD上发布。“深度监控”策略用于为底层提供额外的监控信号，以进一步提高性能。在三个基准人群数据集上的实验结果表明，我们的方法比现有方法具有更好的性能。我们的实现将在https://github.com/shizenglin/Multitask-Multiscale-Deep-NetVLAD上发布。“深度监控”策略用于为底层提供额外的监控信号，以进一步提高性能。在三个基准人群数据集上的实验结果表明，我们的方法比现有方法具有更好的性能。我们的实现将在https://github.com/shizenglin/Multitask-Multiscale-Deep-NetVLAD上发布。

更新日期：2018-11-01

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南

全部期刊列表>>