当前位置: X-MOL 学术Int. J. Comput. Vis. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Pixel-Wise Crowd Understanding via Synthetic Data
International Journal of Computer Vision ( IF 19.5 ) Pub Date : 2020-08-30 , DOI: 10.1007/s11263-020-01365-4
Qi Wang , Junyu Gao , Wei Lin , Yuan Yuan

Crowd analysis via computer vision techniques is an important topic in the field of video surveillance, which has wide-spread applications including crowd monitoring, public safety, space design and so on. Pixel-wise crowd understanding is the most fundamental task in crowd analysis because of its finer results for video sequences or still images than other analysis tasks. Unfortunately, pixel-level understanding needs a large amount of labeled training data. Annotating them is an expensive work, which causes that current crowd datasets are small. As a result, most algorithms suffer from over-fitting to varying degrees. In this paper, take crowd counting and segmentation as examples from the pixel-wise crowd understanding, we attempt to remedy these problems from two aspects, namely data and methodology. Firstly, we develop a free data collector and labeler to generate synthetic and labeled crowd scenes in a computer game, Grand Theft Auto V. Then we use it to construct a large-scale, diverse synthetic crowd dataset, which is named as “GCC Dataset”. Secondly, we propose two simple methods to improve the performance of crowd understanding via exploiting the synthetic data. To be specific, (1) supervised crowd understanding: pre-train a crowd analysis model on the synthetic data, then fine-tune it using the real data and labels, which makes the model perform better on the real world; (2) crowd understanding via domain adaptation: translate the synthetic data to photo-realistic images, then train the model on translated data and labels. As a result, the trained model works well in real crowd scenes.Extensive experiments verify that the supervision algorithm outperforms the state-of-the-art performance on four real datasets: UCF_CC_50, UCF-QNRF, and Shanghai Tech Part A/B Dataset. The above results show the effectiveness, values of synthetic GCC for the pixel-wise crowd understanding. The tools of collecting/labeling data, the proposed synthetic dataset and the source code for counting models are available at https://gjy3035.github.io/GCC-CL/ .

中文翻译:

通过合成数据进行像素级人群理解

通过计算机视觉技术进行人群分析是视频监控领域的一个重要课题,其应用广泛,包括人群监控、公共安全、空间设计等。像素级人群理解是人群分析中最基本的任务,因为它对视频序列或静止图像的结果比其他分析任务更精细。不幸的是,像素级理解需要大量标记的训练数据。注释它们是一项昂贵的工作,这导致当前的人群数据集很小。因此,大多数算法都存在不同程度的过拟合。在本文中,从像素级的人群理解中以人群计数和分割为例,我们试图从数据和方法两个方面来解决这些问题。首先,我们开发了一个免费的数据收集器和标记器,用于在电脑游戏 Grand Theft Auto V 中生成合成和标记的人群场景。然后我们用它来构建一个大规模、多样化的合成人群数据集,称为“GCC 数据集”。其次,我们提出了两种简单的方法,通过利用合成数据来提高人群理解的性能。具体来说,(1)监督人群理解:在合成数据上预训练人群分析模型,然后使用真实数据和标签对其进行微调,使模型在现实世界中表现更好;(2) 通过领域适应进行人群理解:将合成数据转换为照片般逼真的图像,然后在转换后的数据和标签上训练模型。因此,经过训练的模型在真实的人群场景中运行良好。大量实验验证了监督算法在四个真实数据集上的表现优于最先进的性能:UCF_CC_50、UCF-QNRF 和上海技术部分 A/B 数据集。以上结果显示了合成 GCC 对像素级人群理解的有效性和价值。收集/标记数据的工具、提议的合成数据集和计数模型的源代码可在 https://gjy3035.github.io/GCC-CL/ 获得。
更新日期:2020-08-30
down
wechat
bug