Multi-models machine learning methods for traffic flow estimation from Floating Car Data

https://doi.org/10.1016/j.trc.2021.103389Get rights and content

Highlights

  • Application of Machine Learning based model to rebuild the traffic flow relationship.

  • Construction and training of multi-models to reduce estimated traffic flow error.

  • Demonstration of Gaussian Process Regressor to achieve the lowest error.

Abstract

Traffic flow measurement is very important for traffic management systems. However, the existing traditional measurement approaches are highly time-consuming and expensive to continuously gather the required data and to maintain the corresponding equipment, such as loop detectors and video cameras. On the other hand, many services on the web propose to estimate automobile travel time taking into account traffic conditions thanks to crowd sourced data (Floating Car Data). This work proposes to reconstruct, from estimated travel time, traffic flows using machine learning method. In particular, we evaluate the capacity of Gaussian Process Regressor (GPR) to address this issue. After obtaining estimated travel time on a given route, a clustering process shows that travel duration profiles in each day can be associated to different “types of day”. Then, different regressors are trained in order to estimate traffic flows from travel duration. In the “multi-model” variant, we trained a Regressor for each type of day. Conversely, in the “single model” variant, only one Regressor is trained (the type of day is not taken into account). This is an innovative work to estimate and reconstruct the traffic flow in transportation networks with machine learning method from aggregated Floating Car Data (FCD). A series of experiments are conducted to compare the estimated traffic flows, obtained by the proposed single model and multi-model, and the real ones from actual sensors. The obtained results show that both single model and multi-models can capture the tendency of real traffic flows. Furthermore, the performance can be improved by regulating parameters in GPR machine learning model, such as half width of sample window and sample size (a whole week or only weekdays), and multi-models can highly increase the performance compared with the single model. Therefore, the proposed GPR machine learning and FCD based new method can replace those traditional loop detectors for the measurement of traffic flow.

Introduction

With the rapid growth of urban centers during the last decades, the development of efficient urban transportation services has become a central issue to reduce the high wasted time during the daily commute. The resulting increasing demand in terms of transportation flows has to cope with the difficulty to adapt existing or create new transportation networks.

In this context, simulate daily transportation behaviors allows operators to experiment and to visualize decisions about infrastructure and regulation policies. One of the major basics on efficient simulation relies on the ability to produce models representing the way that transportation flows evolve with time, depending on the traffic demands and events that impact the transportation network. The estimation of traffic flow is one of the core requirements in those simulation. One of the costless solution would be to reconstruct the traffic flow from aggregated information (travel duration estimations), available on web services.

Previous work validates that machine learning based approach is a promising way to reconstruct ”sensor like traffic flow data” from aggregated information like the ones proposed by Google services (Li et al., 2019a, Li et al., 2019b). Applied such an approach would permit, for instance, to infer on a realistic traffic demand at each entrance of a city (i.e. the flow of incoming vehicles).

Machine learning over aggregated information approach is based on accessible databases that can provide information regarding the transportation condition (travel duration) at a given location and at a given time. In 2007, the Google company has extended Google Maps by adding Google Live Traffic, the visualization of traffic information in real time (van den Haak et al., 2018, Jeske, 2013). Here, the notion of real time means the current state and is applied to qualify the service of FCD provided. In more detail, Google exploits users’ position data of Android smart-phones, in order to get a significantly fast and accurate mapping of the traffic. This data is called Floating Car Data (FCD), which can also be collected by any localization system embedded in a car and sent to the service provider via a mobile connection. Generally, these raw Floating Car Data are aggregated to provide more intelligible and relevant information regarding traffic condition. For example, in Google Maps, FCD are used to give a real-time traffic information using colored road section1 which is determined by the navigation system or, as in the case of Google Live Traffic, by the smart phone and is sent to the service provider via a mobile phone connection. Therefore this allows the generation of real-time traffic information, which is visualized by the colors on Google Maps: red road points are related to a traffic jam or stop-and-go traffic, orange indicates heavy traffic and green points correspond to clear roads. However, those platforms generally provide only aggregated data like average travel duration more than the initial raw data.

Such an approach would permit operators to provide efficient global information while limiting the effort in continuously measuring road traffic flows based on physical sensors (radars, induction loops, etc.). The number of sensors, even in mid-sized cities, can increase very quickly. For example, to measure the input and output flows of a simple 4-points roundabout, at least 8 physical sensors are required. Therefore, such an expensive traffic flow measurement method makes the mentioned simulation process out of reach.

Preliminary result was published on the 15th World Conference on Transport Research 2019 (Li et al., 2019a) and on the 6th International Conference on Control Decision and Information Technologies 2019 (Li et al., 2019b) where traffic flows is estimated according to Floating Car Data (FCD) from Google Maps only on the basis of regressors trained using machine learning techniques instead of using stationary physical equipment (such as loop detectors (Cheung et al., 2005) or video cameras (Coifman et al., 1998)). In this paper, we present an extension of the previous works with an increased experiment setup that permits to automatize the model definition. Firstly, we show experimentally that, among 19 types of regression methods (including Linear Regression Models, Regression Trees, Support Vector Machines and Ensemble of Trees), the Gaussian Process Regression (GPR) is the most suitable machine learning method to obtain the best fitting criterion with respect to our dataset. Secondly, a selection of the adequate regressor is computed from a set of regressors (multi-model) to estimate traffic flows from FCD, based on the different types of travel duration profiles. This multi-model approach can greatly reduce the estimation error, by precisely clustering days presenting different types of travel duration profiles. Experiments are conducted by comparing estimated flows with real ones provided from induction loop sensors. The results we obtain seem promising enough to say that correct transportation flows models could be obtained with a very light use of real traffic sensors.

This paper is organized as follows: the next section describes related works and the different usages of aggregated FCD in the context of transportation networks modeling. The third section focuses on the problem we choose to address that is building sensor like data flow measurements from aggregated data. In this section, we also provide details regarding our problem formulation on the standpoint we took for solving it. The fourth section presents the single and multi-GPR machine learning method for the estimation of traffic volume. The fifth section deals with the experimental site, the results we obtained, the comparison between estimated traffic flow and real observed data prior to the discussion. The last section concludes this paper and presents some perspectives and further works based on it.

Section snippets

Related work

The successful wide scale deployment of the Advanced Traveler Information Systems (ATIS) and Advanced Traffic Management Systems (ATMS) highly relies on the capability to perform accurate estimation of the real traffic states on road networks. Therefore, the use of real-time Floating Car Data (FCD), based on traces of Global Positioning System (GPS) positions of vehicles, is emerging as a reliable and cost-effective way to collect accurate traffic data for a wide area road network. Unlike other

Problem description and proposed mathematical model

This section firstly describes the problem addressed in the work. Next step presents the proposed system structure, where two types of regressors based models are introduced, including single model and multi-model. Then the feature extraction method is shown. At last, the criteria used to evaluate the proposed system’s performance are presented.

Theory of machine learning methods applied

This section firstly presents the total flowchart of proposed traffic flow estimation system. Then the theory is researched for the Machine Learning related methods applied in this work, as shown in Fig. 3, Fig. 4.

Experiment and validation from the real data for single model

In this section, we conduct a series of experiments over two road segments to evaluate the proposed algorithm and compare the results with real data. Firstly, the performance is compared between estimated traffic flow and real data in the single model regarding data of a first road segment. Then the results between single model and multi-models are compared on another road segment.

Conclusion and future works

This work illustrates that Machine learning techniques based on Aggregated Data permit to estimate the Traffic flows according to Floating Car Data (FCD) only on the basis of trained regressors. Principal Component Analysis (PCA) coupled with k-means technique allows to differentiate clusters of daily FCD profiles. Models are built for each cluster tuned by selecting the appropriate multi-model Gaussian Processes Regressors (GPR) using the Support Vector Machine (SVM) classifier generates

Acknowledgments

This work was supported by the project ORIO (Observing the peRformances of urban Infrastructures and mobility/preventing collisions with vulnerable people using Opportunistic radar). The project ORIO is done within the framework of ELSAT2020, which is co-financed by the European Union with the European Regional Development Fund, the French state and the Hauts de France Region Council. The real traffic flow data from induction loops on the road ”86 Boulevard de la République, Douai, France” was

References (42)

  • ChaiT. et al.

    Root mean square error (RMSE) or mean absolute error (MAE)?–Arguments against avoiding RMSE in the literature

    Geosci. Model Dev.

    (2014)
  • Chen, L., Shi, J., Cheng, M., Zhu, H., Sun, L., 2020. Characteristics of urban road non-recurrent traffic congestion...
  • CheungS. et al.

    Traffic measurement and vehicle classification with single magnetic sensor

    Transp. Res. Rec.: J. Transp. Res. Board

    (2005)
  • Dai, X., Ferman, M.A., Roesser, R.P., 2003. A simulation evaluation of a real-time traffic information system using...
  • de Fabritiis, C., Ragona, R., Valenti, G., 2008. Traffic estimation and prediction based on real time floating car...
  • FureyT.S. et al.

    Support vector machine classification and validation of cancer tissue samples using microarray expression data

    Bioinformatics

    (2000)
  • GoogleT.S.

    Google map API

    (2018)
  • GrothD. et al.

    Principal components analysis

  • Hong, J., Zhang, X., Wei, Z., Li, L., Ren, Y., 2007. Spatial and temporal analysis of probe vehicle-based sampling for...
  • Hong, J., Zhang, X., Wei, Z., Li, L., Ren, Y., 2007. Spatial and temporal analysis of probe vehicle-based sampling for...
  • Hu, J., Li, X., Ou, Y., 2014. Online gaussian process regression for time-varying manufacturing systems. In: 13th IEEE...
  • Cited by (23)

    • Investigating social media spatiotemporal transferability for transport

      2022, Communications in Transportation Research
      Citation Excerpt :

      In transportation, these efforts have been mainly focusing on the aspects of data acquisition – mostly in terms of data collection, information extraction and cleaning and modelling analysis. The analyses most commonly performed are based – to name but a few – on Floating Car Data (Li et al., 2021; Chen et al., 2021b; Astarita et al., 2019, 2020), mobile phone data (Franco et al., 2020; Zhao et al., 2020; Huang et al., 2018; Wang et al., 2018; Zhou et al., 2018), payment and transit card data (Arbex and Cunha, 2020; Tavassoli et al., 2020; Sulis et al., 2018; Yap et al., 2018; Utsunomiya et al., 2006), GPS enabled mobile phone data (Bachir et al., 2019; Bwambale et al., 2017) and social media (Liao et al., 2021; Yao and Qian, 2021; Lock and Pettit, 2020; Hu et al., 2020; Chaniotakis and Antoniou, 2015; Zheng et al., 2016). Of particular interest in regards to the increased data availability is the evolution of pervasive systems (e.g., GPS handsets, cellular networks) and especially the connectivity that has been available to a growing number of individuals, that allow the sharing of different information types such as spatial, temporal, and textual information.

    • Estimating fundamental diagram for multi-modal signalized urban links with limited probe data

      2022, Physica A: Statistical Mechanics and its Applications
      Citation Excerpt :

      The FD (and the MFD) can be obtained either analytically via theoretical modeling [12] or empirically via data-driven methods [13]. While both approaches should serve the objective and yield the FD model which manages to describe all possible traffic states for a road section, it has been recognized that empirical estimation methods are more representative and practical to the real world [14], thus have been explored extensively in the literature. The most common and conventional data source for FD estimation perhaps are from the fix-location loop detectors and cameras (also referred to as CCTV).

    • Optimization and analysis of bioenergy production using machine learning modeling: Multi-layer perceptron, Gaussian processes regression, K-nearest neighbors, and Artificial neural network models

      2022, Energy Reports
      Citation Excerpt :

      GPRs are used to model Gaussian data directly and as the foundation for non-Gaussian models like generalized linear models. So using Gaussian processes regression that is based on GPRs is both simple and accurate for small datasets with high generality (Li et al., 2021b; Zhang and Xu, 2021). The ANN as a computing model can effectively predict the biodiesel production efficiency (Gul et al., 2021; Geetha et al., 2022).

    • Using support vector regression and K-nearest neighbors for short-term traffic flow prediction based on maximal information coefficient

      2022, Information Sciences
      Citation Excerpt :

      As a result of its importance in the real world, traffic prediction is increasingly attracting academic interest [1–3].

    View all citing articles on Scopus
    View full text