Multi-models machine learning methods for traffic flow estimation from Floating Car Data
Introduction
With the rapid growth of urban centers during the last decades, the development of efficient urban transportation services has become a central issue to reduce the high wasted time during the daily commute. The resulting increasing demand in terms of transportation flows has to cope with the difficulty to adapt existing or create new transportation networks.
In this context, simulate daily transportation behaviors allows operators to experiment and to visualize decisions about infrastructure and regulation policies. One of the major basics on efficient simulation relies on the ability to produce models representing the way that transportation flows evolve with time, depending on the traffic demands and events that impact the transportation network. The estimation of traffic flow is one of the core requirements in those simulation. One of the costless solution would be to reconstruct the traffic flow from aggregated information (travel duration estimations), available on web services.
Previous work validates that machine learning based approach is a promising way to reconstruct ”sensor like traffic flow data” from aggregated information like the ones proposed by Google services (Li et al., 2019a, Li et al., 2019b). Applied such an approach would permit, for instance, to infer on a realistic traffic demand at each entrance of a city (i.e. the flow of incoming vehicles).
Machine learning over aggregated information approach is based on accessible databases that can provide information regarding the transportation condition (travel duration) at a given location and at a given time. In 2007, the Google company has extended Google Maps by adding Google Live Traffic, the visualization of traffic information in real time (van den Haak et al., 2018, Jeske, 2013). Here, the notion of real time means the current state and is applied to qualify the service of FCD provided. In more detail, Google exploits users’ position data of Android smart-phones, in order to get a significantly fast and accurate mapping of the traffic. This data is called Floating Car Data (FCD), which can also be collected by any localization system embedded in a car and sent to the service provider via a mobile connection. Generally, these raw Floating Car Data are aggregated to provide more intelligible and relevant information regarding traffic condition. For example, in Google Maps, FCD are used to give a real-time traffic information using colored road section1 which is determined by the navigation system or, as in the case of Google Live Traffic, by the smart phone and is sent to the service provider via a mobile phone connection. Therefore this allows the generation of real-time traffic information, which is visualized by the colors on Google Maps: red road points are related to a traffic jam or stop-and-go traffic, orange indicates heavy traffic and green points correspond to clear roads. However, those platforms generally provide only aggregated data like average travel duration more than the initial raw data.
Such an approach would permit operators to provide efficient global information while limiting the effort in continuously measuring road traffic flows based on physical sensors (radars, induction loops, etc.). The number of sensors, even in mid-sized cities, can increase very quickly. For example, to measure the input and output flows of a simple 4-points roundabout, at least 8 physical sensors are required. Therefore, such an expensive traffic flow measurement method makes the mentioned simulation process out of reach.
Preliminary result was published on the 15th World Conference on Transport Research 2019 (Li et al., 2019a) and on the 6th International Conference on Control Decision and Information Technologies 2019 (Li et al., 2019b) where traffic flows is estimated according to Floating Car Data (FCD) from Google Maps only on the basis of regressors trained using machine learning techniques instead of using stationary physical equipment (such as loop detectors (Cheung et al., 2005) or video cameras (Coifman et al., 1998)). In this paper, we present an extension of the previous works with an increased experiment setup that permits to automatize the model definition. Firstly, we show experimentally that, among 19 types of regression methods (including Linear Regression Models, Regression Trees, Support Vector Machines and Ensemble of Trees), the Gaussian Process Regression (GPR) is the most suitable machine learning method to obtain the best fitting criterion with respect to our dataset. Secondly, a selection of the adequate regressor is computed from a set of regressors (multi-model) to estimate traffic flows from FCD, based on the different types of travel duration profiles. This multi-model approach can greatly reduce the estimation error, by precisely clustering days presenting different types of travel duration profiles. Experiments are conducted by comparing estimated flows with real ones provided from induction loop sensors. The results we obtain seem promising enough to say that correct transportation flows models could be obtained with a very light use of real traffic sensors.
This paper is organized as follows: the next section describes related works and the different usages of aggregated FCD in the context of transportation networks modeling. The third section focuses on the problem we choose to address that is building sensor like data flow measurements from aggregated data. In this section, we also provide details regarding our problem formulation on the standpoint we took for solving it. The fourth section presents the single and multi-GPR machine learning method for the estimation of traffic volume. The fifth section deals with the experimental site, the results we obtained, the comparison between estimated traffic flow and real observed data prior to the discussion. The last section concludes this paper and presents some perspectives and further works based on it.
Section snippets
Related work
The successful wide scale deployment of the Advanced Traveler Information Systems (ATIS) and Advanced Traffic Management Systems (ATMS) highly relies on the capability to perform accurate estimation of the real traffic states on road networks. Therefore, the use of real-time Floating Car Data (FCD), based on traces of Global Positioning System (GPS) positions of vehicles, is emerging as a reliable and cost-effective way to collect accurate traffic data for a wide area road network. Unlike other
Problem description and proposed mathematical model
This section firstly describes the problem addressed in the work. Next step presents the proposed system structure, where two types of regressors based models are introduced, including single model and multi-model. Then the feature extraction method is shown. At last, the criteria used to evaluate the proposed system’s performance are presented.
Theory of machine learning methods applied
This section firstly presents the total flowchart of proposed traffic flow estimation system. Then the theory is researched for the Machine Learning related methods applied in this work, as shown in Fig. 3, Fig. 4.
Experiment and validation from the real data for single model
In this section, we conduct a series of experiments over two road segments to evaluate the proposed algorithm and compare the results with real data. Firstly, the performance is compared between estimated traffic flow and real data in the single model regarding data of a first road segment. Then the results between single model and multi-models are compared on another road segment.
Conclusion and future works
This work illustrates that Machine learning techniques based on Aggregated Data permit to estimate the Traffic flows according to Floating Car Data (FCD) only on the basis of trained regressors. Principal Component Analysis (PCA) coupled with k-means technique allows to differentiate clusters of daily FCD profiles. Models are built for each cluster tuned by selecting the appropriate multi-model Gaussian Processes Regressors (GPR) using the Support Vector Machine (SVM) classifier generates
Acknowledgments
This work was supported by the project ORIO (Observing the peRformances of urban Infrastructures and mobility/preventing collisions with vulnerable people using Opportunistic radar). The project ORIO is done within the framework of ELSAT2020, which is co-financed by the European Union with the European Regional Development Fund, the French state and the Hauts de France Region Council. The real traffic flow data from induction loops on the road ”86 Boulevard de la République, Douai, France” was
References (42)
- et al.
Incident detection methods using probe vehicles with on-board GPS equipment
Transp. Res. Procedia
(2015) - et al.
Speed-flow models for freeways
Procedia-Soc. Behav. Sci.
(2011) - et al.
Spatial-temporal traffic congestion identification and correlation extraction using floating car data
J. Intell. Transp. Syst.
(2021) - et al.
A real-time computer vision system for vehicle tracking and traffic surveillance
Transp. Res. C
(1998) - et al.
Estimating congestion zones and travel time indexes based on the floating car data
Comput. Environ. Urban Syst.
(2021) - et al.
Short-term wind speed prediction using empirical wavelet transform and Gaussian process regression
Energy
(2015) - et al.
Calibration of the combined trip distribution and assignment model for multiple user classes
Transp. Res. B
(1992) - et al.
Traffic state estimation using floating car data
Procedia Comput. Sci.
(2016) - et al.
A comparison of deep learning methods for urban traffic forecasting using floating car data
Transp. Res. Procedia
(2020) - Alsayat, A., El-Sayed, H., 2016. Social media analysis using optimized K-Means clustering. In: IEEE 14th International...
Root mean square error (RMSE) or mean absolute error (MAE)?–Arguments against avoiding RMSE in the literature
Geosci. Model Dev.
Traffic measurement and vehicle classification with single magnetic sensor
Transp. Res. Rec.: J. Transp. Res. Board
Support vector machine classification and validation of cancer tissue samples using microarray expression data
Bioinformatics
Google map API
Principal components analysis
Cited by (23)
Multi-state ship traffic flow analysis using data-driven method and visibility graph
2024, Ocean EngineeringInvestigating social media spatiotemporal transferability for transport
2022, Communications in Transportation ResearchCitation Excerpt :In transportation, these efforts have been mainly focusing on the aspects of data acquisition – mostly in terms of data collection, information extraction and cleaning and modelling analysis. The analyses most commonly performed are based – to name but a few – on Floating Car Data (Li et al., 2021; Chen et al., 2021b; Astarita et al., 2019, 2020), mobile phone data (Franco et al., 2020; Zhao et al., 2020; Huang et al., 2018; Wang et al., 2018; Zhou et al., 2018), payment and transit card data (Arbex and Cunha, 2020; Tavassoli et al., 2020; Sulis et al., 2018; Yap et al., 2018; Utsunomiya et al., 2006), GPS enabled mobile phone data (Bachir et al., 2019; Bwambale et al., 2017) and social media (Liao et al., 2021; Yao and Qian, 2021; Lock and Pettit, 2020; Hu et al., 2020; Chaniotakis and Antoniou, 2015; Zheng et al., 2016). Of particular interest in regards to the increased data availability is the evolution of pervasive systems (e.g., GPS handsets, cellular networks) and especially the connectivity that has been available to a growing number of individuals, that allow the sharing of different information types such as spatial, temporal, and textual information.
Estimating fundamental diagram for multi-modal signalized urban links with limited probe data
2022, Physica A: Statistical Mechanics and its ApplicationsCitation Excerpt :The FD (and the MFD) can be obtained either analytically via theoretical modeling [12] or empirically via data-driven methods [13]. While both approaches should serve the objective and yield the FD model which manages to describe all possible traffic states for a road section, it has been recognized that empirical estimation methods are more representative and practical to the real world [14], thus have been explored extensively in the literature. The most common and conventional data source for FD estimation perhaps are from the fix-location loop detectors and cameras (also referred to as CCTV).
Optimization and analysis of bioenergy production using machine learning modeling: Multi-layer perceptron, Gaussian processes regression, K-nearest neighbors, and Artificial neural network models
2022, Energy ReportsCitation Excerpt :GPRs are used to model Gaussian data directly and as the foundation for non-Gaussian models like generalized linear models. So using Gaussian processes regression that is based on GPRs is both simple and accurate for small datasets with high generality (Li et al., 2021b; Zhang and Xu, 2021). The ANN as a computing model can effectively predict the biodiesel production efficiency (Gul et al., 2021; Geetha et al., 2022).
Using support vector regression and K-nearest neighbors for short-term traffic flow prediction based on maximal information coefficient
2022, Information SciencesCitation Excerpt :As a result of its importance in the real world, traffic prediction is increasingly attracting academic interest [1–3].
Digital twin-based multi-objective autonomous vehicle navigation approach as applied in infrastructure construction
2024, IET Cyber-systems and Robotics