Pixel-Wise Crowd Understanding via Synthetic Data

Wang, Qi; Gao, Junyu; Lin, Wei; Yuan, Yuan

doi:10.1007/s11263-020-01365-4

Pixel-Wise Crowd Understanding via Synthetic Data

Published: 30 August 2020

Volume 129, pages 225–245, (2021)
Cite this article

International Journal of Computer Vision Aims and scope Submit manuscript

Qi Wang ORCID: orcid.org/0000-0002-7028-4956¹,
Junyu Gao¹,
Wei Lin¹ &
…
Yuan Yuan¹

2061 Accesses
95 Citations
Explore all metrics

Abstract

Crowd analysis via computer vision techniques is an important topic in the field of video surveillance, which has wide-spread applications including crowd monitoring, public safety, space design and so on. Pixel-wise crowd understanding is the most fundamental task in crowd analysis because of its finer results for video sequences or still images than other analysis tasks. Unfortunately, pixel-level understanding needs a large amount of labeled training data. Annotating them is an expensive work, which causes that current crowd datasets are small. As a result, most algorithms suffer from over-fitting to varying degrees. In this paper, take crowd counting and segmentation as examples from the pixel-wise crowd understanding, we attempt to remedy these problems from two aspects, namely data and methodology. Firstly, we develop a free data collector and labeler to generate synthetic and labeled crowd scenes in a computer game, Grand Theft Auto V. Then we use it to construct a large-scale, diverse synthetic crowd dataset, which is named as “GCC Dataset”. Secondly, we propose two simple methods to improve the performance of crowd understanding via exploiting the synthetic data. To be specific, (1) supervised crowd understanding: pre-train a crowd analysis model on the synthetic data, then fine-tune it using the real data and labels, which makes the model perform better on the real world; (2) crowd understanding via domain adaptation: translate the synthetic data to photo-realistic images, then train the model on translated data and labels. As a result, the trained model works well in real crowd scenes.Extensive experiments verify that the supervision algorithm outperforms the state-of-the-art performance on four real datasets: UCF_CC_50, UCF-QNRF, and Shanghai Tech Part A/B Dataset. The above results show the effectiveness, values of synthetic GCC for the pixel-wise crowd understanding. The tools of collecting/labeling data, the proposed synthetic dataset and the source code for counting models are available at https://gjy3035.github.io/GCC-CL/.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

How Realistic Should Synthetic Images Be for Training Crowd Counting Models?

Transferring priors from virtual data for crowd counting in real world

Article 11 November 2021

Xiaoheng Jiang, Hao Liu, … Bing Zhou

Focus for Free in Density-Based Counting

Article 09 February 2024

Zenglin Shi, Pascal Mettes & Cees G. M. Snoek

Notes

https://www.flickr.com/.
https://unity3d.com/.
https://www.unrealengine.com/.
https://www.rockstargames.com/.
https://support.rockstargames.com/articles/115009494848/PC-Single-Player-Mods.
https://support.rockstargames.com/articles/200153756/Policy-on-posting-copyrighted-Rockstar-Games-material.
http://www.dev-c.com/gtav/scripthookv/.
https://wiki.gtanet.work/index.php?title=Peds.
This trick effectively improves the counting performance (Gao et al. 2019).
https://github.com/gjy3035/C-3-Framework/tree/python3.x/results_reports.
http://share.crowdbenchmark.com:2443/home/Translation_Results.

References

Abu-El-Haija, S., Kothari, N., Lee, J., Natsev, P., Toderici, G., Varadarajan, B., & Vijayanarasimhan, S. (2016). Youtube-8m. A large-scale video classification benchmark. arXiv preprint arXiv:1609.08675.
Sam, D. B., Sajjan, N. N., Babu, R. V., & Srinivasan, M. (2018). Divide and grow capturing huge diversity in crowd images with incrementally growing CNN. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 3618–3626).
Bak, S., Carr, P., & Lalonde, J. F. (2018). Domain adaptation through synthesis for unsupervised person re-identification. arXiv preprint arXiv:1804.10094.
Cao, X., Wang, Z., Zhao, Y., & Su, F. (2018). Scale aggregation network for accurate and efficient crowd counting. In Proceedings of the European conference on computer vision (pp. 734–750).
Chan, A. B., & Vasconcelos, N. (2009). Bayesian poisson regression for crowd counting. In 2009 IEEE 12th international conference on computer vision (pp. 545–551). IEEE.
Chan, A. B., Liang, Z. S. J., & Vasconcelos, N. (2008). Privacy preserving crowd monitoring: Counting people without people models or tracking. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 1–7).
Chan, A. B., Morrow, M., & Vasconcelos, N, et al. (2009). Analysis of crowded scenes using holistic properties. In Performance evaluation of tracking and surveillance workshop at CVPR (pp. 101–108).
Chen, K., Loy, C. C., Gong, S., & Xiang, T. (2012). Feature mining for localised crowd counting. In Proceedings of the British machine vision conference (vol. 1, p. 3).
Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., & Schiele, B. (2016). The cityscapes dataset for semantic urban scene understanding. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 3213–3223).
Deng, J., Dong, W., Socher R., Li, L. J., Li, K., & Fei-Fei, L. (2009). Imagenet A large-scale hierarchical image database. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 248–255).
Dong, L., Parameswaran. V., Ramesh, V., & Zoghlami, I. (2007). Fast crowd segmentation using shape indexing. In 2007 IEEE 11th international conference on computer vision (pp. 1–8). IEEE.
Dosovitskiy, A.,Ros, G., Codevilla F., Lopez, A., & Koltun, V. (2017). CARLA: An open urban driving simulator. In Proceedings of the 1st annual conference on robot learning (pp. 1–16).
Everingham, M., Eslami, S. A., Van Gool, L., Williams, C. K., Winn, J., & Zisserman, A. (2015). The pascal visual object classes challenge: A retrospective. International journal of computer vision, 111(1), 98–136.
Article Google Scholar
Fu, M., Xu, P., Li, X., Liu, Q., Ye, M., & Zhu, C. (2015). Fast crowd density estimation with convolutional neural networks. Engineering Applications of Artificial Intelligence, 43, 81–88.
Article Google Scholar
Gao, J., Lin, W., Zhao, B., Wang, D., Gao, C., & Wen, J. (2019). C\(^3\) framework. An open-source pytorch code for crowd counting. arXiv preprint arXiv:1907.02724.
Gao, J., Wang, Q., & Li, X. (2019). Pcc net Perspective crowd counting via spatial convolutional network. IEEE Transactions on Circuits and Systems for Video Technology,. https://doi.org/10.1109/TCSVT.2019.2919139.
Article Google Scholar
Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., & Bengio, Y. (2014). Generative adversarial nets. In Proceedings of the advances in neural information processing systems (pp. 2672–2680).
He, K., Zhang, X., Ren, S., & Sun, J. (2016). Deep residual learning for image recognition. Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 770–778).
Huang, G., Liu, Z., Van Der Maaten, L., & Weinberger, KQ. (2017). Densely connected convolutional networks. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 4700–4708).
Idrees, H., Saleemi, I., Seibert, C., & Shah, M. (2013). Multi-source multi-scale counting in extremely dense crowd images. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 2547–2554).
Idrees, H., Tayyab, M., Athrey, K., Zhang, D., Al-Maadeed, S., Rajpoot, N., & Shah, M. (2018). Composition loss for counting, density map estimation and localization in dense crowds. arXiv preprint arXiv:1808.01050.
Jia, Y., Shelhamer, E., Donahue, J., Karayev, S., Long, J., Girshick, R., Guadarrama, S., & Darrell, T. (2014). Caffe. Convolutional architecture for fast feature embedding. In Proceedings of the 22nd ACM international conference on multimedia (pp. 675–678). ACM.
Jiang, X., Xiao, Z., Zhang, B., Zhen, X., Cao, X., Doermann, D., & Shao, L. (2019). Crowd counting and density estimation by trellis encoder-decoder networks. In The IEEE conference on computer vision and pattern recognition (CVPR) (pp. 6133–6142).
Johnson-Roberson, M., Barto, C., Mehta, R., Sridhar, SN., Rosaen, K., & Vasudevan, R. (2017). Driving in the matrix. Can virtual worlds replace human-generated annotations for real world tasks? In: Proceedings of the IEEE international conference on robotics and automation (pp. 1–8).
Kang, K., & Wang, X. (2014). Fully convolutional neural networks for crowd segmentation. arXiv preprint arXiv:1411.4464.
Kempka, M., Wydmuch, M., Runc, G., Toczek, J., & Jaśkowski, W. (2016). Vizdoom: A doom-based ai research platform for visual reinforcement learning. In 2016 IEEE conference on computational intelligence and games (CIG) (pp. 1–8).
Kingma, D. P., & Ba, J. (2014). Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980.
Li, C., Lin, L., Zuo, W., Tang, J., & Yang, M. H. (2018a). Visual tracking via dynamic graph learning. IEEE transactions on pattern analysis and machine intelligence, 41(11), 2770–2782.
Article Google Scholar
Li, T., Chang, H., Wang, M., Ni, B., Hong, R., & Yan, S. (2014). Crowded scene analysis: A survey. IEEE Transactions on Circuits and Systems for Video Technology, 25(3), 367–386.
Article Google Scholar
Li, W., Mahadevan, V., & Vasconcelos, N. (2013). Anomaly detection and localization in crowded scenes. IEEE Transactions on Pattern Analysis and Machine Intelligence, 36(1), 18–32.
Google Scholar
Li, X., Chen, M., Nie, F., & Wang, Q. (2017). A multiview-based parameter free framework for group detection. In 31st AAAI conference on artificial intelligence.
Li, Y., Zhang, X., & Chen, D. (2018b). Csrnet: Dilated convolutional neural networks for understanding the highly congested scenes. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 1091–1100).
Lian, D., Li, J., Zheng, J., Luo, W., & Gao, S. (2019). Density map regression guided detection network for rgb-d crowd counting and localization. In The IEEE conference on computer vision and pattern recognition (CVPR) (pp. 1821–1830).
Lin, T. Y., Maire, M., Belongie, S., Hays, J., Perona, P., Ramanan, D., Dollár, P., & Zitnick, C. L. (2014). Microsoft coco Common objects in context. In European conference on computer vision (pp. 740–755). Springer.
Liu, J., Gao, C., Meng, D., & Hauptmann, A. G. (2018a). Decidenet. Counting varying density crowds through attention guided detection and density estimation. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 5197–5206).
Liu, L., Wang, H., Li, G., Ouyang, W., & Lin, L. (2018b). Crowd counting using deep recurrent spatial-aware network. arXiv preprint arXiv:1807.00601.
Liu, W., Salzmann, M., & Fua, P. (2019). Context-aware crowd counting. In The IEEE conference on computer vision and pattern recognition (CVPR) (pp. 5099–5108).
Liu, X., van de Weijer, J., & Bagdanov, A. D. (2018c). Leveraging unlabeled data for crowd counting by learning to rank. arXiv preprint arXiv:1803.03095.
Long J, Shelhamer E, Darrell T (2015) Fully convolutional networks for semantic segmentation. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 3431–3440).
Mahadevan, V., Li, W., Bhalodia, V., & Vasconcelos, N. (2010). Anomaly detection in crowded scenes. In IEEE computer society conference on computer vision and pattern recognition (pp. 1975–1981). IEEE.
Marsden, M., McGuinness, K., Little, S., & O’Connor, NE. (2016). Fully convolutional crowd counting on highly congested scenes. arXiv preprint arXiv:1612.00220.
Mehran, R., Oyama, A., & Shah, M. (2009). Abnormal crowd behavior detection using social force model. In 2009 IEEE conference on computer vision and pattern recognition (pp. 935–942). IEEE.
Onororubio, D., & Lopezsastre, R. J. (2016). Towards perspective-free object counting with deep learning (pp. 615–629).
Pan, X., Shi, J., Luo, P., Wang, X., & Tang, X. (2017). Spatial as deep. Spatial cnn for traffic scene understanding. arXiv preprint arXiv:1712.06080.
Paszke, A., Gross, S., Chintala, S., & Chanan, G. (2017). Pytorch: Tensors and dynamic neural networks in python with strong gpu acceleration.
Popoola, O. P., & Wang, K. (2012). Video-based abnormal human behavior recognition-a review. IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews), 42(6), 865–878.
Qiu, W., Zhong, F., Zhang, Y., Qiao, S., Xiao, Z., Kim, T. S., et al. (2017). Unrealcv. ACM Multimedia Open Source Software Competition: Virtual worlds for computer vision.
Book Google Scholar
Ranjan, V., Le, H., & Hoai, M. (2018). Iterative crowd counting. arXiv preprint arXiv:1807.09959.
Richter, S. R., Vineet, V., Roth, S., & Koltun, V. (2016). Playing for data: Ground truth from computer games. In Proceedings of the European conference on computer vision (pp. 102–118).
Richter, S. R., Hayder, Z., & Koltun, V. (2017). Playing for benchmarks. In Proceedings of the international conference on computer vision (Vol. 2).
Ros, G., Sellart, L., Materzynska, J., Vazquez, D., & Lopez, A. M. (2016). The synthia dataset: A large collection of synthetic images for semantic segmentation of urban scenes. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 3234–3243).
Sam, D. B., Surya, S., & Babu, R. V. (2017). Switching convolutional neural network for crowd counting. In Proceedings of the IEEE conference on computer vision and pattern recognition (Vol. 1, p. 6).
Sam, D. B., Sajjan, N. N., & Babu, R. V. (2018). Divide and grow Capturing huge diversity in crowd images with incrementally growing cnn. arXiv preprint arXiv:1807.09993.
Sam, D. B., Sajjan, N. N., Maurya, H., & Babu, R. V. (2019). Almost unsupervised learning for dense crowd counting. In Proceedings of the 33rd AAAI conference on artificial intelligence, Honolulu, HI, USA (Vol. 27).
Shah, S., Dey, D., Lovett, C., & Kapoor, A. (2018). Airsim High-fidelity visual and physical simulation for autonomous vehicles. InField and service robotics (pp. 621–635). Springer.
Shen, Z., Xu, Y., Ni, B., Wang, M., Hu, J., & Yang, X. (2018). Crowd counting via adversarial cross-scale consistency pursuit. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 5245–5254).
Shi, Z., Zhang, L., Liu, Y., Cao, X., Ye, Y., Cheng, M. M., & Zheng, G. (2018). Crowd counting with deep negative correlation learning. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 5382–5390).
Simonyan, K., & Zisserman, A. (2014). Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556.
Sindagi, V. A., & Patel, V. M. (2017a). Cnn-based cascaded multi-task learning of high-level prior and density estimation for crowd counting. In Proceedings of the IEEE international conference on advanced video and signal based surveillance (pp. 1–6).
Sindagi, V. A., & Patel, V. M. (2017b). Generating high-quality crowd density maps using contextual pyramid cnns. In Proceedings of the IEEE international conference on computer vision (pp. 1879–1888).
Sindagi, V. A., Yasarla, R., & Patel, V. M. (2019). Pushing the frontiers of unconstrained crowd counting: New dataset and benchmark method. In Proceedings of the IEEE international conference on computer vision (pp. 1221–1231).
Sindagi, V. A., Yasarla, R., & Patel, V. M. (2020). Jhu-crowd++. Technical Report: Large-scale crowd counting dataset and a benchmark method.
Google Scholar
Walach, E., & Wolf, L. (2016). Learning to count with cnn boosting (pp. 660–676).
Wan, J., Luo, W., Wu, B., Chan, A. B., & Liu, W. (2019). Residual regression with semantic prior for crowd counting. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 4036–4045).
Wang, C., Zhang, H., Yang, L., Liu, S., & Cao, X. (2015). Deep people counting in extremely dense crowds. In Proceedings of the 23rd ACM international conference on multimedia (pp. 1299–1302). ACM.
Wang, Q., Chen, M., Nie, F., & Li, X. (2018a). Detecting coherent groups in crowd scenes by multiview clustering. IEEE Transactions on Pattern Analysis and Machine Intelligence,. https://doi.org/10.1109/TPAMI.2018.2875002.
Article Google Scholar
Wang, Q., Wan, J., & Yuan, Y. (2018b). Deep metric learning for crowdedness regression. IEEE Transactions on Circuits and Systems for Video Technology, 28(10), 2633–2643.
Article Google Scholar
Wang, Q., Gao, J., Lin, W., & Yuan, Y. (2019). Learning from synthetic data for crowd counting in the wild. In Proceedings of IEEE conference on computer vision and pattern recognition (CVPR) (pp. 8198–8207).
Wang, Q., Gao, J., Lin, W., & Li, X. (2020). Nwpu-crowd A large-scale benchmark for crowd counting. arXiv preprint arXiv:2001.03360.
Wang, Z., Bovik, A. C., Sheikh, H. R., & Simoncelli, E. P. (2004). Image quality assessment: From error visibility to structural similarity. IEEE Transactions on Image Processing, 13(4), 600–612.
Article Google Scholar
Xiong, F., Shi, X., & Yeung, D. Y. (2017). Spatiotemporal modeling for crowd counting in videos. arXiv preprint arXiv:1707.07890.
Yan, Z., Yuan, Y., Zuo, W., Tan, X., Wang, Y., Wen, S., & Ding, E. (2019). Perspective-guided convolution networks for crowd counting. In Proceedings of the IEEE international conference on computer vision (pp. 952–961).
Yuan, Y., Fang, J., & Wang, Q. (2014). Online anomaly detection in crowd scenes via structure analysis. IEEE Transactions on Cybernetics, 45(3), 548–561.
Article Google Scholar
Zhang, C., Kang, K., Li, H., Wang, X., Xie, R., & Yang, X. (2016a). Data-driven crowd understanding: A baseline for a large-scale crowd dataset. IEEE Transactions on Multimedia, 18(6), 1048–1061.
Article Google Scholar
Zhang, Y., Zhou, D., Chen, S., Gao, S., & Ma, Y. (2016b). Single-image crowd counting via multi-column convolutional neural network. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 589–597).
Zhao, M., Zhang, J., Zhang, C., & Zhang, W. (2019). Leveraging heterogeneous auxiliary tasks to assist crowd counting. In The IEEE conference on computer vision and pattern recognition (CVPR) (pp. 12736–12745).
Zhu, J. Y., Park, T., Isola, P., & Efros, A. A. (2017). Unpaired image-to-image translation using cycle-consistent adversarial networks. arXiv preprint arXiv:1703.10593.
Zuo, W., Wu, X., Lin, L., Zhang, L., & Yang, M. H. (2018). Learning support correlation filters for visual tracking. IEEE Transactions on Pattern Analysis and Machine Intelligence, 41(5), 1158–1172.
Article Google Scholar

Download references

Author information

Authors and Affiliations

School of Computer Science and Center for Optical Imagery Analysis and Learning (OPTIMAL), Northwestern Polytechnical University, Xi’an, 710072, Shaanxi, China
Qi Wang, Junyu Gao, Wei Lin & Yuan Yuan

Authors

Qi Wang
View author publications
You can also search for this author in PubMed Google Scholar
Junyu Gao
View author publications
You can also search for this author in PubMed Google Scholar
Wei Lin
View author publications
You can also search for this author in PubMed Google Scholar
Yuan Yuan
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Yuan Yuan.

Additional information

Communicated by Jifeng Dai.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

This work was supported by the National Key R&D Program of China under Grant 2017YFB1002202, National Natural Science Foundation of China under Grant U1864204, 61773316, 61632018, and 61825603.

Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 1828 KB)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Wang, Q., Gao, J., Lin, W. et al. Pixel-Wise Crowd Understanding via Synthetic Data. Int J Comput Vis 129, 225–245 (2021). https://doi.org/10.1007/s11263-020-01365-4

Download citation

Received: 17 January 2020
Accepted: 30 July 2020
Published: 30 August 2020
Issue Date: January 2021
DOI: https://doi.org/10.1007/s11263-020-01365-4

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Pixel-Wise Crowd Understanding via Synthetic Data

Abstract

Access this article

Similar content being viewed by others

How Realistic Should Synthetic Images Be for Training Crowd Counting Models?

Transferring priors from virtual data for crowd counting in real world

Focus for Free in Density-Based Counting

Notes

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Electronic supplementary material

Supplementary material 1 (pdf 1828 KB)

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Pixel-Wise Crowd Understanding via Synthetic Data

Abstract

Access this article

Similar content being viewed by others

How Realistic Should Synthetic Images Be for Training Crowd Counting Models?

Transferring priors from virtual data for crowd counting in real world

Focus for Free in Density-Based Counting

Notes

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Electronic supplementary material

Supplementary material 1 (pdf 1828 KB)

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation