Multi-models machine learning methods for traffic flow estimation from Floating Car Data

doi:10.1016/j.trc.2021.103389

Transportation Research Part C: Emerging Technologies

Volume 132, November 2021, 103389

https://doi.org/10.1016/j.trc.2021.103389 Get rights and content

Highlights

•
Application of Machine Learning based model to rebuild the traffic flow relationship.
•
Construction and training of multi-models to reduce estimated traffic flow error.
•
Demonstration of Gaussian Process Regressor to achieve the lowest error.

Abstract

Traffic flow measurement is very important for traffic management systems. However, the existing traditional measurement approaches are highly time-consuming and expensive to continuously gather the required data and to maintain the corresponding equipment, such as loop detectors and video cameras. On the other hand, many services on the web propose to estimate automobile travel time taking into account traffic conditions thanks to crowd sourced data (Floating Car Data). This work proposes to reconstruct, from estimated travel time, traffic flows using machine learning method. In particular, we evaluate the capacity of Gaussian Process Regressor (GPR) to address this issue. After obtaining estimated travel time on a given route, a clustering process shows that travel duration profiles in each day can be associated to different “types of day”. Then, different regressors are trained in order to estimate traffic flows from travel duration. In the “multi-model” variant, we trained a Regressor for each type of day. Conversely, in the “single model” variant, only one Regressor is trained (the type of day is not taken into account). This is an innovative work to estimate and reconstruct the traffic flow in transportation networks with machine learning method from aggregated Floating Car Data (FCD). A series of experiments are conducted to compare the estimated traffic flows, obtained by the proposed single model and multi-model, and the real ones from actual sensors. The obtained results show that both single model and multi-models can capture the tendency of real traffic flows. Furthermore, the performance can be improved by regulating parameters in GPR machine learning model, such as half width of sample window and sample size (a whole week or only weekdays), and multi-models can highly increase the performance compared with the single model. Therefore, the proposed GPR machine learning and FCD based new method can replace those traditional loop detectors for the measurement of traffic flow.

Introduction

With the rapid growth of urban centers during the last decades, the development of efficient urban transportation services has become a central issue to reduce the high wasted time during the daily commute. The resulting increasing demand in terms of transportation flows has to cope with the difficulty to adapt existing or create new transportation networks.

In this context, simulate daily transportation behaviors allows operators to experiment and to visualize decisions about infrastructure and regulation policies. One of the major basics on efficient simulation relies on the ability to produce models representing the way that transportation flows evolve with time, depending on the traffic demands and events that impact the transportation network. The estimation of traffic flow is one of the core requirements in those simulation. One of the costless solution would be to reconstruct the traffic flow from aggregated information (travel duration estimations), available on web services.

Previous work validates that machine learning based approach is a promising way to reconstruct ”sensor like traffic flow data” from aggregated information like the ones proposed by Google services (Li et al., 2019a, Li et al., 2019b). Applied such an approach would permit, for instance, to infer on a realistic traffic demand at each entrance of a city (i.e. the flow of incoming vehicles).

Machine learning over aggregated information approach is based on accessible databases that can provide information regarding the transportation condition (travel duration) at a given location and at a given time. In 2007, the Google company has extended Google Maps by adding Google Live Traffic, the visualization of traffic information in real time (van den Haak et al., 2018, Jeske, 2013). Here, the notion of real time means the current state and is applied to qualify the service of FCD provided. In more detail, Google exploits users’ position data of Android smart-phones, in order to get a significantly fast and accurate mapping of the traffic. This data is called Floating Car Data (FCD), which can also be collected by any localization system embedded in a car and sent to the service provider via a mobile connection. Generally, these raw Floating Car Data are aggregated to provide more intelligible and relevant information regarding traffic condition. For example, in Google Maps, FCD are used to give a real-time traffic information using colored road section¹ which is determined by the navigation system or, as in the case of Google Live Traffic, by the smart phone and is sent to the service provider via a mobile phone connection. Therefore this allows the generation of real-time traffic information, which is visualized by the colors on Google Maps: red road points are related to a traffic jam or stop-and-go traffic, orange indicates heavy traffic and green points correspond to clear roads. However, those platforms generally provide only aggregated data like average travel duration more than the initial raw data.

Such an approach would permit operators to provide efficient global information while limiting the effort in continuously measuring road traffic flows based on physical sensors (radars, induction loops, etc.). The number of sensors, even in mid-sized cities, can increase very quickly. For example, to measure the input and output flows of a simple 4-points roundabout, at least 8 physical sensors are required. Therefore, such an expensive traffic flow measurement method makes the mentioned simulation process out of reach.

Preliminary result was published on the 15th World Conference on Transport Research 2019 (Li et al., 2019a) and on the 6th International Conference on Control Decision and Information Technologies 2019 (Li et al., 2019b) where traffic flows is estimated according to Floating Car Data (FCD) from Google Maps only on the basis of regressors trained using machine learning techniques instead of using stationary physical equipment (such as loop detectors (Cheung et al., 2005) or video cameras (Coifman et al., 1998)). In this paper, we present an extension of the previous works with an increased experiment setup that permits to automatize the model definition. Firstly, we show experimentally that, among 19 types of regression methods (including Linear Regression Models, Regression Trees, Support Vector Machines and Ensemble of Trees), the Gaussian Process Regression (GPR) is the most suitable machine learning method to obtain the best fitting criterion with respect to our dataset. Secondly, a selection of the adequate regressor is computed from a set of regressors (multi-model) to estimate traffic flows from FCD, based on the different types of travel duration profiles. This multi-model approach can greatly reduce the estimation error, by precisely clustering days presenting different types of travel duration profiles. Experiments are conducted by comparing estimated flows with real ones provided from induction loop sensors. The results we obtain seem promising enough to say that correct transportation flows models could be obtained with a very light use of real traffic sensors.

This paper is organized as follows: the next section describes related works and the different usages of aggregated FCD in the context of transportation networks modeling. The third section focuses on the problem we choose to address that is building sensor like data flow measurements from aggregated data. In this section, we also provide details regarding our problem formulation on the standpoint we took for solving it. The fourth section presents the single and multi-GPR machine learning method for the estimation of traffic volume. The fifth section deals with the experimental site, the results we obtained, the comparison between estimated traffic flow and real observed data prior to the discussion. The last section concludes this paper and presents some perspectives and further works based on it.

Section snippets

Related work

The successful wide scale deployment of the Advanced Traveler Information Systems (ATIS) and Advanced Traffic Management Systems (ATMS) highly relies on the capability to perform accurate estimation of the real traffic states on road networks. Therefore, the use of real-time Floating Car Data (FCD), based on traces of Global Positioning System (GPS) positions of vehicles, is emerging as a reliable and cost-effective way to collect accurate traffic data for a wide area road network. Unlike other

Problem description and proposed mathematical model

This section firstly describes the problem addressed in the work. Next step presents the proposed system structure, where two types of regressors based models are introduced, including single model and multi-model. Then the feature extraction method is shown. At last, the criteria used to evaluate the proposed system’s performance are presented.

Theory of machine learning methods applied

This section firstly presents the total flowchart of proposed traffic flow estimation system. Then the theory is researched for the Machine Learning related methods applied in this work, as shown in Fig. 3, Fig. 4.

Experiment and validation from the real data for single model

In this section, we conduct a series of experiments over two road segments to evaluate the proposed algorithm and compare the results with real data. Firstly, the performance is compared between estimated traffic flow and real data in the single model regarding data of a first road segment. Then the results between single model and multi-models are compared on another road segment.

Conclusion and future works

This work illustrates that Machine learning techniques based on Aggregated Data permit to estimate the Traffic flows according to Floating Car Data (FCD) only on the basis of trained regressors. Principal Component Analysis (PCA) coupled with k-means technique allows to differentiate clusters of daily FCD profiles. Models are built for each cluster tuned by selecting the appropriate multi-model Gaussian Processes Regressors (GPR) using the Support Vector Machine (SVM) classifier generates

Acknowledgments

This work was supported by the project ORIO (Observing the peRformances of urban Infrastructures and mobility/preventing collisions with vulnerable people using Opportunistic radar). The project ORIO is done within the framework of ELSAT2020, which is co-financed by the European Union with the European Regional Development Fund, the French state and the Hauts de France Region Council. The real traffic flow data from induction loops on the road ”86 Boulevard de la République, Douai, France” was

References (42)

AsakuraY. et al.
Incident detection methods using probe vehicles with on-board GPS equipment
Transp. Res. Procedia
(2015)
BrilonW. et al.
Speed-flow models for freeways
Procedia-Soc. Behav. Sci.
(2011)
ChenY. et al.
Spatial-temporal traffic congestion identification and correlation extraction using floating car data
J. Intell. Transp. Syst.
(2021)
CoifmanB. et al.
A real-time computer vision system for vehicle tracking and traffic surveillance
Transp. Res. C
(1998)
ErdelićT. et al.
Estimating congestion zones and travel time indexes based on the floating car data
Comput. Environ. Urban Syst.
(2021)
HuJ. et al.
Short-term wind speed prediction using empirical wavelet transform and Gaussian process regression
Energy
(2015)
LamW.H. et al.
Calibration of the combined trip distribution and assignment model for multiple user classes
Transp. Res. B
(1992)
SunderrajanA. et al.
Traffic state estimation using floating car data
Procedia Comput. Sci.
(2016)
VázquezJ.J. et al.
A comparison of deep learning methods for urban traffic forecasting using floating car data
Transp. Res. Procedia
(2020)
Alsayat, A., El-Sayed, H., 2016. Social media analysis using optimized K-Means clustering. In: IEEE 14th International...

ChaiT. et al.

Root mean square error (RMSE) or mean absolute error (MAE)?–Arguments against avoiding RMSE in the literature

Geosci. Model Dev.

(2014)

Chen, L., Shi, J., Cheng, M., Zhu, H., Sun, L., 2020. Characteristics of urban road non-recurrent traffic congestion...

CheungS. et al.

Traffic measurement and vehicle classification with single magnetic sensor

Transp. Res. Rec.: J. Transp. Res. Board

(2005)

Dai, X., Ferman, M.A., Roesser, R.P., 2003. A simulation evaluation of a real-time traffic information system using...

de Fabritiis, C., Ragona, R., Valenti, G., 2008. Traffic estimation and prediction based on real time floating car...

FureyT.S. et al.

Support vector machine classification and validation of cancer tissue samples using microarray expression data

Bioinformatics

(2000)

GoogleT.S.

Google map API

(2018)

GrothD. et al.

Principal components analysis

Hong, J., Zhang, X., Wei, Z., Li, L., Ren, Y., 2007. Spatial and temporal analysis of probe vehicle-based sampling for...

Hu, J., Li, X., Ou, Y., 2014. Online gaussian process regression for time-varying manufacturing systems. In: 13th IEEE...

Cited by (23)

Multi-state ship traffic flow analysis using data-driven method and visibility graph
2024, Ocean Engineering
Ship traffic flow characteristics play a crucial role in enhancing the effectiveness and efficiency of intelligent maritime traffic management systems. The primary objective of this study is to establish a comprehensive framework for analyzing multi-state traffic flow based on the automatic identification system (AIS). The collected AIS data undergoes preprocessing to calculate traffic flow density, velocity, and intensity. Subsequently, clustering techniques, specifically the K-medoids algorithm and silhouette coefficient analysis, are applied to classify traffic states ranging from least congested to highly congested. The datasets corresponding to each cluster are then utilized to construct visibility graphs, which enable a graphical representation of the traffic flow dynamics. Statistical analysis is conducted to examine the topological characteristics of the network. To illustrate the applicability of the proposed framework, a case study of the Meishan island water areas is conducted, allowing for an in-depth analysis of ship traffic flow characteristics and the identification of distinct traffic flow states. The findings of this study demonstrate the effectiveness of the visibility graph method in analyzing multi-state ship traffic flow. Additionally, the statistical characteristics derived from the developed complex networks adeptly capture the inherent maritime traffic flow characteristics. The insights gained from this study contribute to the advancement of maritime traffic management by providing a deeper understanding of complex traffic flow patterns and delineation.
Investigating social media spatiotemporal transferability for transport
2022, Communications in Transportation Research
Citation Excerpt :
In transportation, these efforts have been mainly focusing on the aspects of data acquisition – mostly in terms of data collection, information extraction and cleaning and modelling analysis. The analyses most commonly performed are based – to name but a few – on Floating Car Data (Li et al., 2021; Chen et al., 2021b; Astarita et al., 2019, 2020), mobile phone data (Franco et al., 2020; Zhao et al., 2020; Huang et al., 2018; Wang et al., 2018; Zhou et al., 2018), payment and transit card data (Arbex and Cunha, 2020; Tavassoli et al., 2020; Sulis et al., 2018; Yap et al., 2018; Utsunomiya et al., 2006), GPS enabled mobile phone data (Bachir et al., 2019; Bwambale et al., 2017) and social media (Liao et al., 2021; Yao and Qian, 2021; Lock and Pettit, 2020; Hu et al., 2020; Chaniotakis and Antoniou, 2015; Zheng et al., 2016). Of particular interest in regards to the increased data availability is the evolution of pervasive systems (e.g., GPS handsets, cellular networks) and especially the connectivity that has been available to a growing number of individuals, that allow the sharing of different information types such as spatial, temporal, and textual information.
Social Media have increasingly provided data about the movement of people in cities making them useful in understanding the daily life of people in different geographies. Particularly useful for travel analysis is when Social Media users allow (voluntarily or not) tracing their movement using geotagged information of their communication with these online platforms. In this paper we use geotagged tweets from 10 cities in the European Union and United States of America to extract spatiotemporal patterns, study differences and commonalities among these cities, and explore the nature of user location recurrence. The analysis here shows the distinction between residents and tourists is fundamental for the development of city-wide models. Identification of repeated rates of location (recurrence) can be used to define activity spaces. Differences and similarities across different geographies emerge from this analysis in terms of local distributions but also in terms of the worldwide reach among the cities explored here. The comparison of the temporal signature between geotagged and non-geotagged tweets also shows similar temporal distributions that capture in essence city rhythms of tweets and activity spaces.
Estimating fundamental diagram for multi-modal signalized urban links with limited probe data
2022, Physica A: Statistical Mechanics and its Applications
Citation Excerpt :
The FD (and the MFD) can be obtained either analytically via theoretical modeling [12] or empirically via data-driven methods [13]. While both approaches should serve the objective and yield the FD model which manages to describe all possible traffic states for a road section, it has been recognized that empirical estimation methods are more representative and practical to the real world [14], thus have been explored extensively in the literature. The most common and conventional data source for FD estimation perhaps are from the fix-location loop detectors and cameras (also referred to as CCTV).
Being one of the most classic concepts in the traffic flow theory, the fundamental diagram (FD) describes the relationship between average flow and average density of link-level traffic flow dynamics. Inductive loop detectors or closed-circuit television are commonly used for FD estimations and they are known to have cost-effective and accuracy issues. Thanks to the GPS-enabled smartphones and GPS-equipped probe vehicles, high temporal and spatial resolution traffic data are available which enable traffic condition inference over time and space continuously. Several existing studies have explored FD estimation algorithms on freeways where flow is generally uninterrupted and uni-modal, based on GPS trajectory data. These developments motivate this study, where the objective is to extend the application to multi-modal and interrupted environment, i.e., urban signalized areas. In this paper, an estimation method is developed to capture the FD of multi-modal traffic streams on signalized urban links. The proposed algorithm is empirically tested using real-world GPS datasets collected on a signalized arterial road in Shenzhen City. Promising results show that the proposed algorithm is capable to estimate the FD under such condition. Furthermore, impacts of multi-modal traffic and signal operations on the FD estimation are analyzed and discussed.
Optimization and analysis of bioenergy production using machine learning modeling: Multi-layer perceptron, Gaussian processes regression, K-nearest neighbors, and Artificial neural network models
2022, Energy Reports
Citation Excerpt :
GPRs are used to model Gaussian data directly and as the foundation for non-Gaussian models like generalized linear models. So using Gaussian processes regression that is based on GPRs is both simple and accurate for small datasets with high generality (Li et al., 2021b; Zhang and Xu, 2021). The ANN as a computing model can effectively predict the biodiesel production efficiency (Gul et al., 2021; Geetha et al., 2022).
Since fossil fuels are slowly depleting, bio and renewable energies are now given more attention. The main purpose of this research is to investigate and optimize the influencing parameters of bioenergy production through transesterification process. The application of artificial intelligence (AI) in bioenergy production studies has become increasingly popular due to its capability of interpreting nonlinear relationships between inputs and outputs for complex systems. Here, after conducting library studies and carefully reviewing the existing methods, the multi-layer perceptron (MLP), K-nearest neighbors (KNN), Artificial neural network (ANN), and Gaussian processes regression (GPR) models were selected for simulation and prediction of the efficiency of fatty acid methyl ester (FAME) production. The main effective transesterification parameters on production of biodiesel including the temperature of reaction (°C), catalyst mass to oil mass ratio (wt.%), and the molar ratio of methanol to oil were set as the input variables in all studied models. For reaction between oil and short chain alcohols, wollastonite (a calcium metasilicate, CaSiO₃) was utilized as a phase boundary catalyst. By carefully selecting the execution conditions of the algorithms in the model selection phase, all three models reached a result above 0.99 and close to 1 with the square R criterion. Also, the RMSE values for the studied models were 3.95 for MLP, 1.09 for KNN, 0.13 for ANN and 3.60 for GPR models. Therefore, it can be concluded that although the ANN model was to be a better model in process efficiency prediction in terms of error, but all three algorithms had high accuracy because of different generality types. The optimum yield of 97.8% for FAME production was observed at optimum methanol to oil molar ratio, reaction temperature, and catalyst mass to oil mass ratio 65 °C, 15, and 9.21 wt%, respectively.
Using support vector regression and K-nearest neighbors for short-term traffic flow prediction based on maximal information coefficient
2022, Information Sciences
Citation Excerpt :
As a result of its importance in the real world, traffic prediction is increasingly attracting academic interest [1–3].
The prediction of short-term traffic flow is critical for improving service levels for drivers and passengers as well as enhancing the efficiency of traffic management in the urban transportation system. For transportation departments, the issue remains of how to efficiently utilize the spatial and temporal information of traffic data for better prediction performance. As a means of improving traffic prediction accuracy, this paper proposes a method for screening spatial time-delayed traffic series based on the maximal information coefficient. The selected time-delayed traffic series are transformed into traffic state vectors, from which traffic flow is predicted by adopting the combination of support vector regression method and k-nearest neighbors method. We employ the proposed framework for real-world traffic flow prediction. Root Mean Squared Error (RMSE) and Mean Absolute Percent Error (MAPE) validate the superior performance of the proposed model compared to traditional methods. This new approach reduces the RMSE by 23.448% and the MAPE by 14.726% of the predicted results.
Digital twin-based multi-objective autonomous vehicle navigation approach as applied in infrastructure construction
2024, IET Cyber-systems and Robotics

View all citing articles on Scopus

View full text