Technical SectionAn augmented crowd simulation system using automatic determination of navigable areas
Graphical abstract
Introduction
Crowd simulations investigate the interaction of individuals inside and among groups of people, in terms of behavior, appearance, personality, and emotions. Models used in such simulations aim for a realistic interaction with the environment; hence, the appearance and behavior of the virtual agents that represent individuals should fit the context of the scene for better immersion. Quantitative methods assess the realism of such simulations, comparing the simulated crowd with real-world data.
Augmenting virtual crowds into real-life videos has applications in entertainment, security, and education. Virtual crowds can cost-effectively fill environments in movies, appearing together with real actors; virtual tutors can move inside live environments to create immersion in training applications. In such augmented crowd simulations, virtual agents should be indistinguishable from the real people and should interact with the real crowd and the environment realistically. This requires careful inspection of the environment and the individuals in the video.
Augmented crowd simulations benefit from data-driven approaches for pedestrian and scene inference. Using a model for the environment and pedestrian trajectories, a virtual crowd can be augmented into the input video, so that virtual agents plausibly move in the scene without colliding with each other and the real pedestrians. However, in such a workflow, many steps require labor-intensive manual processing, including the construction of an environment model for the virtual crowd.
We introduce our open-source augmented crowd simulation system that utilizes an automated approach for the determination and reconstruction of navigable regions in real-life surveillance-like videos. We make use of existing methods of semantic segmentation and pedestrian tracking to determine image-level navigable regions. Then we reconstruct the aerial view of these regions as a flat mesh and position it in our 3D crowd simulation environment. From the perspective of the automatically calibrated scene camera, the virtual agents move inside the navigable regions of the video, avoiding scene obstacles, real pedestrians, and other virtual agents. We evaluate the accuracy of the generated navigable regions in comparison to the ground truth, using real-life surveillance videos.
We list our contributions as:
- •
Automatic determination and reconstruction of image-level navigable areas in surveillance-like videos for seamless integration of virtual agents.
- •
Evaluation of the resulting image-level navigable areas using different combinations of segmentation networks and training sets.
Section snippets
Data-driven crowd simulations
Many applications of crowd simulations utilize real-life data for realistic agent behavior. Musse et al. [1], Lerner et al. [2], and Kim et al. [3] extract pedestrian trajectories from real-life video sequences to simulate movements of virtual agents in various crowd scenarios. Jablonski et al. [4] evaluate the accuracy and the realism of crowd simulations in comparison to real-life footage, using pedestrian flow. Amirian et al. [5], [6] generate crowd trajectories that mimic the behavior of
Framework
Our open-source framework, outlined in Fig. 1, provides an augmented interactive crowd simulation in Unity [29]. We simulate virtual agents walking in navigable regions of the input video while avoiding collision with real pedestrians. To reconstruct the navigable scene, we preprocess the input video using computer vision techniques included in the OpenCV library [30]. The crowd simulation runs in real-time, and the preprocessing is performed off-line.
The input of our system is the video of an
Navigable area reconstruction
We infer the navigable regions of the scene and generate a 2D navigation mesh based on the union of these regions in an aerial view. We position the 2D mesh into our simulation environment with the camera configuration of the video, so that the virtual agents that walk on the navigation mesh appear as if they are walking on the navigable regions of the video. This reconstruction process involves the following steps:
- 1.
We analyze the video frames to determine the navigable regions using
Evaluation
We test our framework on various stationary surveillance-like videos including PETS09-S2L1 [44], Town Centre [62], MOT16-04 [31], and a custom video. Fig. 8 includes the horizon, extracted navigable areas, their placement into the 3D scene with dummy agents, and the final output with virtual agents for each test video.
Table 1 shows a quantitative comparison of different pedestrian trackers for PETS09-S2L1 and our custom video. Recall is the percentage of identified pedestrians overall in the
Conclusion
We introduce an open-source augmented crowd simulation system that utilizes automatic determination of navigable regions in surveillance-like videos. The GitHub project including the repositories that contain the source codes of the proposed system is located at https://github.com/users/YalimD/projects/2. We combine existing techniques of semantic segmentation and pedestrian detection for accurate determination of the navigable regions. We compare our results with the ground truth and manually
Declaration of Competing Interest
The authors declare that they have no conflicts of interest.
Acknowledgments
The authors are grateful to Lori Russell-Dağ and İpek Sözen for proofreading the manuscript.
References (65)
- et al.
Walking with virtual people: Evaluation of locomotion interfaces in dynamic environments
IEEE Trans Vis ComputGraph
(2018) - et al.
Object detection and tracking-based camera calibration for normalized human height estimation
J Sens
(2016) - et al.
Random Sample Consensus: a paradigm for model fitting with applications to image analysis and automated cartography
Commun ACM
(1981) - et al.
Using computer vision to simulate the motion of virtual agents
Comput Anim Virtual Worlds
(2007) - et al.
Crowds by example
Comput Graph Forum
(2007) - et al.
Interactive and adaptive data-driven crowd simulation
Proceedings of IEEE virtual reality. VR ’16;
(2016) - et al.
Evaluation framework for crowd behaviour simulation and analysis based on real videos and scene reconstruction
Proceedings of the 6th Latin-American conference on networked and electronic media. LACNEM ’15;
(2015) - et al.
Data-driven crowd simulation with generative adversarial networks
Proceedings of the 32nd international conference on computer animation and social agents. CASA 19;
(2019) - et al.
Social ways: learning multi-modal distributions of pedestrian trajectories with GANs
Proceedings of the IEEE/CVF conference on computer vision and pattern recognition workshops. CVPRW ’19;
(2019) - et al.
Online parameter learning for data-driven crowd simulation and content generation
Comput Graph
(2016)
ARCrowd-a tangible interface for interactive crowd simulation
Proceedings of the 16th international conference on intelligent user interfaces. IUI ’11;
Generation of augmented video sequences combining behavioral animation and multi-object tracking
Comput Anim Virtual Worlds
Coupling camera-tracked humans with a simulated virtual crowd
Proceedings of the 9th international conference on computer graphics theory and applications. GRAPP ’14;
Online inserting virtual characters into dynamic video scenes
Comput Anim Virtual Worlds
Augmentation of virtual agents in real crowd videos
Signal Image Video Process
Vanishing point detection using cascaded 1D Hough Transform from single images
Pattern Recognit Lett
Detecting vanishing points using global image context in a non-Manhattan world
Proceedings of the IEEE conference on computer vision and pattern recognition. CVPR ’16
Using the scene to calibrate the camera
Proceedings of the 29th SIBGRAPI conference on graphics, patterns and images. SIBGRAPI’16
Surveillance camera autocalibration based on pedestrian height distributions
British machine vision conference
Automatic calibration of stationary surveillance cameras in the wild
European conference on computer vision. ECCV ’16;
Metric rectification for perspective images of planes
Proceedings of the IEEE computer society conference on computer vision and pattern recognition. CVPR ’98;
Creating architectural models from images
Comput Graph Forum
Ground plane rectification by tracking moving objects
Proceedings of the joint IEEE International workshop on visual surveillance and performance evaluation of tracking and surveillance
Auto-rectification of user photos
IEEE international conference on image processing. (ICIP ’14)
Populating virtual cities using social media
Comput Anim Virtual Worlds
Efficiently modeling 3D scenes from a single image
IEEE Comput Graph Appl
As-consistent-as-possible compositing of virtual objects and video sequences
Comput Anim Virtual Worlds
Automatic photo pop-up
ACM Trans Graph
Make3D: learning 3D scene structure from a single still image
IEEE Trans Pattern Anal MachIntell
MOT16: a benchmark for multi-object tracking
CoRR
Cited by (0)
This paper was recommended for publication by Stefanie Zollmann.