Skip to main content
Log in

Depth and edge auxiliary learning for still image crowd density estimation

  • Short paper
  • Published:
Pattern Analysis and Applications Aims and scope Submit manuscript

Abstract

Crowd counting plays a significant role in crowd monitoring and management, which suffers from various challenges, especially in crowd-scale variations and background interference issues. Therefore, we propose a method named depth and edge auxiliary learning for still image crowd density estimation to cope with crowd-scale variations and background interference problems simultaneously. The proposed multi-task framework contains three sub-tasks including the crowd head edge regression, the crowd density map regression and the relative depth map regression. The crowd head edge regression task outputs distinctive crowd head edge features to distinguish crowd from complex background. The relative depth map regression task perceives crowd-scale variations and outputs multi-scale crowd features. Moreover, we design an efficient fusion strategy to fuse the above information and make the crowd density map regression generate high-quality crowd density maps. Various experiments were conducted on four main-stream datasets to verify the effectiveness and portability of our method. Experimental results indicate that our method can achieve competitive performance compared with other superior approaches. In addition, our proposed method improves the counting accuracy of the baseline network by \(15.6\%\).

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14

References

  1. LeCun Y, Bengio Y, Hinton G. (2015) Deep learning. In: Nature, pp 436–444

  2. Liu J, Gao C, Meng D, et al. (2018) Decidenet: counting varying density crowds through attention guided detection and density estimation. In: IEEE Conference on Computer Vision and Pattern Recognition, pp 5197–5206

  3. Idrees H, Tayyab M, Athrey K, et al. (2018) Composition loss for counting, density map estimation and localization in dense crowds. In: European Conference on Computer Vision, pp 532–546

  4. Zhang A, Shen J, Xiao Z, et al. (2019) Relational attention network for crowd counting. In: IEEE International Conference on Computer Vision, pp 6788–6797

  5. Liu W, Salzmann M, Fua P. (2019) Context-aware crowd counting. In: IEEE Conference on Computer Vision and Pattern Recognition, pp 5099–5108

  6. Zhang Y, Zhou D, Chen S, et al. (2016) Single-image crowd counting via multi-column convolutional neural network. In: IEEE Conference on Computer Vision and Pattern Recognition, pp 589–597

  7. Li Y, Zhang X, Chen D. (2018) Csrnet: dilated convolutional neural networks for understanding the highly congested scenes. In: IEEE Conference on Computer Vision and Pattern Recognition, pp 1091–1100

  8. Wang L, Yin B, Tang X, et al. (2019) Removing background interference for crowd counting via de-background detail convolutional network. In: Neurocomputing, pp 332: 360–371

  9. Idrees H, Saleemi I, Seibert C, et al. (2013) Multi-source multi-scale counting in extremely dense crowd images. In: IEEE Conference on Computer Vision and Pattern Recognition, pp 2547–2554

  10. Zhao M, Zhang J, Zhang C, et al. (2019) Leveraging heterogeneous auxiliary tasks to assist crowd counting. In: IEEE Conference on Computer Vision and Pattern Recognition, pp 12736–12745

  11. Canny J (1986) A computational approach to edge detection. IEEE Trans Pattern Anal Mach Intell 8(6):679–698

    Article  Google Scholar 

  12. Simonyan K, Zisserman A. (2014) Very deep convolutional networks for large-scale image recognition. In: IEEE Conference on Computer Vision and Pattern Recognition

  13. Shi M, Yang Z, Xu C, et al. (2019) Revisiting perspective information for efficient crowd counting. In: IEEE Conference on Computer Vision and Pattern Recognition, pp 1–10

  14. Dalal N, Triggs B. (2005) Histograms of oriented gradients for human detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp 886–893

  15. Leibe B, Seemann E, Schiele B (2005) Pedestrian detection in crowded scenes. IEEE Conf Comput Vision Pattern Recognit 1:878–885

    Google Scholar 

  16. Tuzel O, Porikli F, Meer P (2008) Pedestrian detection via classification on riemannian manifolds. IEEE Trans Pattern Anal Mach Intell 30(10):1713–1727

    Article  Google Scholar 

  17. Viola P, Jones M (2004) Robust real-time face detection. Int J Comput Vision 57(2):137–154

    Article  Google Scholar 

  18. Wu B, Nevatia R. (2015) Detection of multiple, partially occluded humans in a single image by Bayesian combination of edgelet part detectors. In: International Conference on Computer Vision, pp 90–97

  19. Sabzmeydani, P., Mori, G. (2007) Detecting pedestrians by learning shapelet features. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 1–8

  20. Davies A C, Yin J H, Velastin S A, et al. (1995) Crowd monitoring using image processing. Electr Commun Eng J, pp 37–47

  21. Lempitsky V, Zisserman A. (2010) Learning To count objects in images. Neural Inf Process Syst, pp 1324–1332

  22. Rabaud V, Belongie S. (2006) Counting crowded moving objects. In: IEEE Conference on Computer Vision and Pattern Recognition, pp 705–711

  23. Brostow G J, Cipolla R. (2006) Unsupervised Bayesian detection of independent motion in crowds. In: IEEE Conference on Computer Vision and Pattern Recognition, pp 594–601

  24. Zhang L, Shi M, Chen Q, et al. (2018) Crowd Counting via scale-adaptive convolutional neural network. In: Workshop on Applications of Computer Vision, pp 1113–1121

  25. Cao X, Wang Z, Zhao Y, et al. (2018) Scale aggregation network for accurate and efficient crowd counting. In: European Conference on Computer Vision, pp 757–773

  26. Zhang Q, Chan A B. (2019) Wide-area crowd counting via ground-plane density maps and multi view fusion CNNs. In: IEEE Conference on Computer Vision and Pattern Recognition, pp 8297–8306

  27. Xu C, Qiu K, Fu J, et al. (2019) Learn to scale: generating multipolar normalized density maps for crowd counting. In: International Conference on Computer Vision, pp 8382–8390

  28. Liu N, Long Y, Zou C, et al. (2019) ADCrowdNet: an attention-injective deformable convolutional network for crowd understanding. In: Computer Vision and Pattern Recognition, pp 3225–3234

  29. Gao J, Wang Q, Yuan Y, et al. (2019) SCAR: Spatial-/channel-wise attention regression networks for crowd counting. In: Neurocomputing, pp 1–8

  30. Jiang X, Zhang L, Zhang T, et al. (2020) Density-aware multi-task learning for crowd counting. IEEE Transactions on Multimedia, pp 1–1

  31. Sandwell DT (1987) Biharmonic spline interpolation of Geos-3 and Seasat altimeter data. Geophys Res Lett, pp 139–142

  32. Kingma D P, Ba J. (2014) Adam: a method for stochastic optimization. In: arXiv preprint arXiv:1412.6980

  33. Sindagi V A, Patel V M. (2017) CNN-based cascaded multi-task learning of high-level prior and density estimation for crowd counting. In: Advanced Video and Signal Based Surveillance, pp 1–6

  34. Shi Z, Zhang L, Liu Y, et al. (2018) Crowd counting with deep negative correlation learning. In: IEEE Conference on Computer vision and pattern recognition, pp 5382–5390

  35. Sam D B, Sajjan N N, Babu R V, et al. (2018) Divide and grow: capturing huge diversity in crowd images with incrementally growing CNN. In: IEEE Conference on Computer Vision and Pattern Recognition, pp 3618–3626

  36. Wang Q, Gao J, Lin W, et al. (2019) Learning from synthetic data for crowd counting in the wild. In: IEEE Conference on Computer Vision and Pattern Recognition, pp 8198–8207

  37. Shi Z, Mettes P, Snoek C G, et al. (2019) Counting with focus for free. In: International Conference on Computer Vision, pp 4200–4209

  38. Oh M, Olsen P A, Ramamurthy K N, et al. (2020) Crowd counting with decomposed uncertainty. In: National Conference on Artificial Intelligence

  39. Sam D B, Peri S V, Sundararaman M N, et al. (2020) Locate, size and count: accurately resolving people in dense crowds via detection. In: IEEE Transactions on Pattern Analysis and Machine Intelligence, pp 1–1

  40. Yan Z, Yuan Y, Zuo W, et al. (2019) Perspective-guided convolution networks for crowd counting. In: International Conference on Computer Vision, pp 952–961

  41. Chen X, Bin Y, Sang N, et al. (2019) Scale pyramid network for crowd counting. In: Workshop on Applications of Computer Vision, pp 1941–1950

  42. Vishwanath A. Sindagi, Rajeev Yasarla, Deepak Sam Babu, et al. (2020) Learning to count in the crowd from limited labeled data. In: European Conference on Computer Vision

  43. Yan Liu, Lingqiao Liu, Peng Wang, et al. (2020) Semi-supervised crowd counting via self-training on surrogate tasks. In: European Conference on Computer Vision

  44. Sam D B, Surya S, Babu R V, et al. (2017) Switching convolutional neural network for crowd counting. In: IEEE Conference on Computer Vision and Pattern Recognition, pp 4031–4039

  45. Krizhevsky A, Sutskever I, Hinton G E. (2012) Imagenet classification with deep convolutional neural networks. In: Advances in neural information processing systems, pp 1097–1105

  46. Khan N, Ullah A, Haq I U, et al. (2020) SD-Net: understanding overcrowded scenes in real-time via an efficient dilated convolutional neural network. J Real-Time Image Process, pp 1–15

Download references

Acknowledgements

This work is supported by the Equipment Pre-Research Foundation of China under grant No.61403120201.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Sifan Peng.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Material

Supplementary Material

1.1 Network architecture

Fig. 15
figure 15

The detailed network structure of the well-designed network model called FENet based on our multi-task framework DEAL. We directly label the number of convolution kernels in different convolutional layers

To validate the difference against transferred learning models, we design a new CNN model from scratch. The new model is named FENet as shown in Fig. 15, which contains multiple FEM modules to extract multi-scale crowd features. The FEM module consists of multiple columns of ordinary convolution and dilated convolution with different dilation rates. We adopt a pyramid fusion method to separately integrate the features output from each FEM module. Next, the proposed relative depth map task and the crowd edge regression task supervise the network to learn multi-scale crowd features and crowd head edge features. Finally, we concatenate the crowd head edge features and the multi-scale crowd features to generate a high-quality crowd density map.

1.2 Experiment results

We conducted various experiments to evaluate the performance of the designed FENet on four public datasets including ShangTech_Part_A(Part_A), UCF_CC_50, ShangTech_Part_B(Part_B) and UCF-QNRF. In Table 9, “FENet(DEAL)” represents that our multi-task method is applied on FENet and “FENet(W/O)” stands for FENet with our multi-task method removed. Compared with “FENet(W/O)”, the MAE of the designed “FENet(DEAL)” is reduced by 8.3, 1.3, 23.32 and 16.9 on Part_A, Part_B, UCF_CC_50 and UCF-QNRF, respectively. We also show the performance of the VGG-based models, including the “ours(DEAL)” as shown in Fig. 4, and the “Ours(W/O)” that removes our multi-task method. As shown in Table 9, the MAE of “Ours(DEAL)” is decreased compared with “Ours(W/O)” on the four public datasets. Moreover, we compare the computational complexity between “FENet(DEAL)” and “Ours(DEAL)”, which shows that “FENet(DEAL)” needs fewer FLOPs.

In summary, the proposed multi-task method is both effective on FENet and VGG [12]. FENet has lower computational complexity than the model designed based on the VGG [12], but the performance is not as good as the model based on VGG [12]. To achieve accurate crowd counting results, we use a higher accuracy model in this paper as shown in Fig. 4.

Table 9 Performance comparisons of different methods on four datasets

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Peng, S., Yin, B., Hao, X. et al. Depth and edge auxiliary learning for still image crowd density estimation. Pattern Anal Applic 24, 1777–1792 (2021). https://doi.org/10.1007/s10044-021-01017-4

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10044-021-01017-4

Keywords

Navigation