Skip to main content

Advertisement

Log in

Dissecting GeoSparkSim: a scalable microscopic road network traffic simulator in Apache Spark

  • Published:
Distributed and Parallel Databases Aims and scope Submit manuscript

Abstract

Researchers and practitioners have widely studied road network traffic data in different areas such as urban planning, traffic prediction and spatial-temporal databases. For instance, researchers use such data to evaluate the impact of road network changes. Unfortunately, collecting large-scale high-quality urban traffic data requires tremendous efforts because participating vehicles must install global positioning system(GPS) receivers and administrators must continuously monitor these devices. There have been some urban traffic simulators trying to generate such data with different features. However, they suffer from two critical issues (1) Scalability: most of them only offer single-machine solution which is not adequate to produce large-scale data. Some simulators can generate traffic in parallel but do not well balance the load among machines in a cluster. (2) Granularity: many simulators do not consider microscopic traffic situations including traffic lights, lane changing, car following. This paper proposed GeoSparkSim, a scalable traffic simulator which extends Apache Spark to generate large-scale road network traffic datasets with microscopic traffic simulation. The proposed system seamlessly integrates with a Spark-based spatial data management system, GeoSpark, to deliver a holistic approach that allows data scientists to simulate, analyze and visualize large-scale urban traffic data. To implement microscopic traffic models, GeoSparkSim employs a simulation-aware vehicle partitioning method to partition vehicles among different machines such that each machine has a balanced workload. The experimental analysis shows that GeoSparkSim can simulate the movements of 300 thousand vehicles over a very large road network (250 thousand road junctions and 300 thousand road segments) and outperform the existing competitors.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14

Similar content being viewed by others

Notes

  1. Source code: https://github.com/zishanfu/GeoSparkSim.

  2. Demo video: https://jiayuasu.github.io/files/video/geosparksim-demo.mp4.

References

  1. Zheng, Y., Xie, X., Ma, W.Y.: Geolife: a collaborative social networking service among user, location and trajectory. IEEE Data Eng. Bull. 33(2), 32 (2010)

    Google Scholar 

  2. Brinkhoff, T.: A framework for generating network-based moving objects. GeoInformatica 6(2), 153 (2002)

    Article  Google Scholar 

  3. Düntgen, C., Behr, T., Güting, R.H.: BerlinMOD: a benchmark for moving object databases. VLDB J. 18(6), 1335 (2009). https://doi.org/10.1007/s00778-009-0142-5

    Article  Google Scholar 

  4. Krajzewicz, D., Hertkorn, G., Rössel, C., Wagner, P.: SUMO (Simulation of Urban MObility)-an open-source traffic simulation. In: Proceedings of the 4th middle East Symposium on Simulation and Modelling (MESM20002), pp. 183–187 (2002)

  5. Nagel, K., Rickert, M.: Parallel implementation of the TRANSIMS micro-simulation. Parallel Comput. 27(12), 1611 (2001)

    Article  Google Scholar 

  6. Klefstad, R., Zhang, Y., Lai, M., Jayakrishnan, R., Lavanya, R.: A distributed, scalable, and synchronized framework for large-scale microscopic traffic simulation. In: Intelligent Transportation Systems, 2005. Proceedings. 2005 IEEE, IEEE, pp. 813–818 (2005)

  7. Ramamohanarao, K., Xie, H., Kulik, L., Karunasekera, S., Tanin, E., Zhang, R., Khunayn, E.B.: Smarts: scalable microscopic adaptive road traffic simulator. ACM Trans. Intell. Syst. Technol. 8(2), 26 (2017)

    Article  Google Scholar 

  8. Lu, J., Guting, R.H.: Parallel secondo: boosting database engines with Hadoop. In: International Conference on Parallel and Distributed Systems, pp. 738 –743 (2012)

  9. Hadoop (n.d.). https://hadoop.apache.org/

  10. Zaharia, M., Chowdhury, M., Das, T., Dave, A., Ma, J., McCauly, M., Franklin, M.J., Shenker, S., Stoica, I.: Resilient distributed datasets: a fault-tolerant abstraction for in-memory cluster computing. In: Proceedings of the USENIX Symposium on Networked Systems Design and Implementation, NSDI, pp. 15–28 (2012)

  11. Guting, R.H., Almeida, V., Ansorge, D., Behr, T., Ding, Z., Hose, T., Hoffmann, F., Spiekermann, M., Telle, U.: Secondo: an extensible dbms platform for research prototyping and teaching. In: Data Engineering, 2005. ICDE 2005. Proceedings. 21st International Conference on, IEEE, pp. 1115–1116 (2005)

  12. Gipps’ model (2019). https://en.wikipedia.org/wiki/Gipps%27_model

  13. Gabriel, E., Fagg, G.E., Bosilca, G., Angskun, T., Dongarra, J.J., Squyres, J.M., Sahay, V., Kambadur, P.,  Barrett,B., Lumsdaine, A., Castain, R.H., Daniel, D.J., Graham, R.L., Woodall, T.S.: Open MPI: Goals, Concept, and Design of a Next Generation MPI Implementation. In: Recent Advances in Parallel Virtual Machine and Message Passing Interface, 11th European PVM/MPI Users’ Group Meeting, Budapest, Hungary, September 19-22, 2004, Proceedings, pp. 97–104 (2004)

  14. Waraich, R.A., Charypar, D., Balmer, M., Axhausen, K.W.: Performance improvements for large scale traffic simulation in MATSim. In: 9th STRC Swiss Transport Research Conference: Proceedings, vol. 565 (Swiss Transport Research Conference, 2009), vol. 565

  15. Paramics Microsimulation (2019). https://www.paramics.co.uk/en/

  16. Vinoski, S.: CORBA: integrating diverse applications within distributed heterogeneous environments. IEEE Commun. Mag. 35(2), 46 (1997)

    Article  Google Scholar 

  17. OpenStreetMap (2019). http://www.openstreetmap.org/

  18. Mokbel, M.F., Alarabi, L., Bao, J., Eldawy, A., Magdy, A., Sarwat, M., Waytas, E., Yackel, S.: MNTG: an extensible web-based traffic generator. In: International Symposium on Spatial and Temporal Databases, Springer, pp. 38–55 (2013)

  19. Yu, J., Zhang, Z., Sarwat, M.: Spatial Data Management in Apache Spark: The GeoSpark Perspective and Beyond. Geoinformatica (2018)

  20. Eldawy, A., Alarabi, L., Mokbel, M.F.: Spatial partitioning techniques in SpatialHadoop. Proc. Int. Conf. Very Large Data Bases 8(12), 1602 (2015)

    Google Scholar 

  21. Zaharia, M., Chowdhury, M., Franklin, M.J., Shenker, S., Stoica, I.: Spark: Cluster Computing with Working Sets. In: USENIX Workshop on Hot Topics in Cloud Computing, HotCloud’10, Boston, MA, USA, June 22, 2010 (2010)

  22. Kesting, A., Treiber, M., Helbing, D.: Enhanced intelligent driver model to access the impact of driving strategies on traffic capacity. Philos. Trans. R. Soc. Lond A 368(1928), 4585 (2010)

    MATH  Google Scholar 

  23. Kesting, A., Treiber, M., Helbing, D.: General lane-changing model MOBIL for car-following models. Transp. Res. Rec. 1999(1), 86 (2007)

    Article  Google Scholar 

  24. Karich, P., Schröder, S.: Graphhopper. http://www.graphhopper.com, Last accessed 4(2), 15 (2014)

Download references

Acknowledgements

This work is supported by the National Science Foundation (NSF) under Grant 1845789.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jia Yu.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Yu, J., Fu, Z. & Sarwat, M. Dissecting GeoSparkSim: a scalable microscopic road network traffic simulator in Apache Spark. Distrib Parallel Databases 38, 963–994 (2020). https://doi.org/10.1007/s10619-020-07306-x

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10619-020-07306-x

Keywords

Navigation