Day-to-day dynamic origin–destination flow estimation using connected vehicle trajectories and automatic vehicle identification data

https://doi.org/10.1016/j.trc.2021.103241Get rights and content

Highlights

  • A novel methodology for recovering day-to-day dynamic OD flow.

  • Fusion of CV trajectories and AVI observations.

  • Obtaining prior OD flows by addressing penetration rate variation and sparsity issue.

  • Determining final estimates utilizing day-to-day traffic characteristics.

Abstract

Dynamic vehicular origin–destination (OD) flow is a fundamental component of traffic network modeling and its estimation has long been studied. Although ideal observing conditions and behavioral assumptions are often indispensable for estimation, day-to-day traffic recurrences and variations are seldom utilized to improve the estimation performance. In this paper, we propose a new method to recover day-to-day dynamic OD flows using both connected vehicle (CV) trajectories and automatic vehicle identification (AVI) observations. The method involves two modules: the first module provides reliable prior OD flows given limited observations, while the second module seeks the optimal estimates based on the prior OD flows. In the first module, linear projection is extended to consider temporal and spatial variation of the CV penetration rate, and non-negative Tucker decomposition (NTD) is adopted to address the data sparsity issue caused by the low CV penetration rate. In the second module, a self-supervised learning model called the latency-constrained autoencoder (LCAE) is established to search for the optimal OD flows according to the priors with given robust latent features. To avoid local minima and ensure consistency between estimates, a novel algorithm called adaptive sub-sample correction (ASC) is proposed and integrated into the optimization process of LCAE, which can iteratively correct the most inconsistent samples based on the day-to-day traffic flow characteristics. The proposed method is examined on an empirical urban arterial network, a calibrated simulation network, and a synthetic large-scale grid network. Our results indicated that the proposed method requires very few AVI detectors and CV trajectories to achieve competitive estimation performance against two benchmark models. Furthermore, general robustness to several factors with respect to observing conditions and data quality was investigated, and satisfactory scalability was also demonstrated in terms of both estimation accuracy and computational cost.

Introduction

Dynamic origin–destination (OD) flow reveals the time-dependent travel demand on road networks. It serves as the fundamental input for dynamic traffic assignment (DTA) models as well as for network optimization programs (Arsava et al., 2018, Peeta and Ziliaskopoulos, 2001). The dynamic OD flow fluctuates from day-to-day due to variations and stochasticity in trip patterns. Thus, active traffic network management also requires accurate estimation of day-to-day dynamic OD flows to handle the uncertainty of traffic demand.

Despite the extensive studies across decades, obtaining accurate time-dependent OD flows given network observations remains challenging due to the observability issue in traffic networks (Castillo et al., 2008a). The observations from the network are much less than the unknown OD flows, and thus, models may not produce a unique solution. Under such circumstances, the existing models often start with a prior OD estimate and solve an optimization program to identify the solution that is most consistent with available observations and assumptions. Then, the objective of such program is to minimize the deviation between estimated and observed (or prior) variables while maintaining network flow conservation described by DTA process. However, a reliable prior may not always exist, especially at central business districts or under rapid urbanization in many developing countries. Besides, DTA models are usually established based on departure time and route choice behavior assumptions to approach user equilibrium status, which could largely deviate from the realistic situation (Yildirimoglu and Kahraman, 2017, Zhu and Levinson, 2015). Furthermore, most current studies have focused on within-day situations while considering deterministic OD flows, such as estimation of the morning peak period; they have not considered day-to-day recurrence and variation of OD flows. Only a few studies have dealt with day-to-day OD flows, but they have mainly estimated the mean, variance and covariance assuming certain OD demand distribution, e.g. multivariate normal distribution (Ma and Qian, 2018b, Shao et al., 2014).

In recent years, connected vehicles, such as vehicles of DiDi and Uber that equipped with GPS unit or drivers that use navigation service on their mobile phone, have emerged as a promising mobile data source because they can provide detailed and accurate traffic flow information. Meanwhile, vehicle re-identification systems have also been rapidly deployed in many countries. The main components of these systems are automatic vehicle identification (AVI) detectors that could uniquely identify each vehicle, including radio frequency identification device (RFID)-based detectors, blue-tooth detectors, and license plate recognition (LPR) devices. Among them, the LPR cameras are mostly used in China, and the data could be accessed by the supplier as well as law enforcement apartment. These data sources can directly provide OD and path-related observations. Owing to the availability of these day-to-day continuous and multi-source heterogeneous observations, external priors and unrealistic assumptions may no longer be required (Ma and Qian, 2018b, Yang et al., 2018). Nevertheless, according to our literature review, few efforts have been devoted to fully utilize the day-to-day observations of these emerging sensors to address the external prior OD usage issue and assumptions in DTA modeling. Therefore, in this paper, we eliminate historical priors and behavioral assumptions and infer OD flows using both CV trajectories and AVI observations via a purely data-driven method.

Early attempts on OD flow estimation mainly focused on the modeling framework based on fixed link counts, including the entropy minimization model (Van Zuylen & Willumsen, 1980), the maximum likelihood estimator (Spiess, 1987), Bayesian inference method (Maher, 1983), and generalized least squares (GLS) models (Bell, 1991, Cascetta, 1984). Among them, the GLS-based models have been most frequently extended and tested, including bi-level and single-level GLS models. The bi-level models consider the effects of congestion, in which the upper level minimizes deviation terms in the form of least square error and the lower level performs DTA based on inference of the equilibrium states (Yang et al., 1991, Yang et al., 1992). To better describe the DTA process, simulators are often incorporated and models are then solved by the stochastic perturbation simultaneous approximation algorithm (Lu et al., 2015). However, this bi-level structure usually leads to non-convexity; therefore, single-level models have been proposed based on relaxation techniques (Lu et al., 2013, Nie and Zhang, 2010). Several recent studies have also shed light on data-driven approaches. Ma and Qian (2018a) used high-granular traffic count and speed data to estimate multi-year 24/7 dynamic OD demands, where a GLS model is established given estimated assignment ratio; Krishnakumari et al. (2020) also uses count and speed data to estimate OD matrix under the mild assumption of proportional flow on shortest paths. Both studies showed satisfactory results and demonstrated that data-driven approaches are promising and efficient. Generally, with only aggregated traffic counts available, reliable priors and effective assumptions are indispensable to fill the observability gap. However, obtaining reliable initial OD matrices is generally difficult and labor-intensive, and the aforementioned assumptions could possibly deviate from realistic conditions.

In terms of AVI observations, several researchers have derived travel times and traffic counts from AVI detectors and have conducted OD flow estimation and prediction by integrating link counts with these observations (Dixon and Rilett, 2002, Zhou and Mahmassani, 2006). In addition to traffic counts, these detectors can reproduce partial paths of vehicles and thus provide further flow and travel time constraints to facilitate path and OD flow estimation. Following this direction, Bayesian methods have been adopted to recover paths (Castillo et al., 2008b, Castillo et al., 2008c, Mo et al., 2020), and state-space models, especially particle filtering (Feng et al., 2015, Rao et al., 2018, Yang and Sun, 2015), have been also introduced to probabilistically reconstruct the path of each individual vehicle. Subsequently, path and OD flows can be obtained by aggregation. Despite the promising results, these works require a large AVI coverage rate (e.g., 40–80%), which is rare occasion in a realistic network, especially large ones. In addition, reducing the uncertainties in the exact travel origin and destination is difficult using AVI-based OD flow estimation methods. Although travel paths can be effectively recovered between AVI detectors, the path from the origin to the first detected location (or last detected location to destination) can hardly be recognized, and route choice assumption is still necessary to address this issue.

CVs have recently facilitated many tasks in traffic modeling including the OD flow estimation. Compared with fixed detectors, which use indirect variables to estimate OD flow and suffer from the observability issue, CVs can nearly cover the entire network and provide direct OD flow samples. With such high-coverage, time-continuous OD samples, the focus of modeling shifts from reducing the uncertainties of unobservable OD flow to measuring the reliability of sampled OD flow (i.e., CV OD flow). Following this direction, the quantity and quality requirements of probe data for estimating population OD flows have been discussed and examined by some early studies based on several toy network examples (Eisenman and List, 2004, Van Aerde et al., 1993), and penetration rates of 10–30% have been regarded as sufficient. Moreover, several studies have investigated the route choices and trip distributions of probe vehicles and have demonstrated the feasibility of using projected probe OD as prior OD for estimation (Ásmundsdóttir, 2008, Ásmundsdóttir et al., 2010). Based on these insights, bi-level GLS models with exogenous DTA simulators have been employed to further incorporate probe vehicles or floating car trajectories (Cao et al., 2013, Carrese et al., 2017). In another benchmark study, Yang et al. (2017) formulated two single-level GLS models based on both probe vehicle trajectories and link counts. The route choices of probe vehicles were used to compute traffic assignment fractions, and the relationship between OD and link penetration rates was established; thus, there were few assumptions made in this model. Generally, these studies projected the CV OD flow as an estimate or prior according to presumed or derived penetration rates. However, few studies have explicitly considered the error of projected OD flow in the model. Thus, optimization models may be trapped in local minima and the solution may be inaccurate. Furthermore, these existing studies often assumed penetration rates of more than 10%, which is considered to be rare in the current CV market (Tan et al., 2019, Yao et al., 2019), and the estimation performance rapidly deteriorated with a decrease in penetration rate.

To summarize, with the availability of detailed and continuous observations, recent AVI-based and CV-based studies require less historical data and assumptions compared with link count-based studies. The superiority in estimation accuracy has also been demonstrated under certain observation conditions, which often deviates from the currently prevailing market status. However, two research problems still need to be addressed. The first problem concerns obtaining a reliable prior estimate, when there are limited available AVI detectors or CVs. Several studies have recognized the minimum level of AVI coverage rate or CV penetration rate; however, only a few have explicitly dealt with the accompanying problem of data sparsity. The second research problem concerns ensuring an optimal estimate according to the prior. Most existing methods rely on the DTA process to establish constraints, such as link flow conservation and travel time consistency. In this way, estimation errors are prone to increase because of either unrealistic assumption regarding user behavior or improper simplification of the DTA process.

In view of the first problem, we translated the limited observation problem into the reliability of the CV OD flow projection and problem of data sparsity imputation. Linear projection is extended based on fusion of AVI observations and CV trajectories to reduce the bias in the prevailing simple scaling, and a low-rank approximation method facilitated by the multi-dimensional tensor is adopted to deal with the sparsity problem in projected OD flow. To deal with the second research problem, we propose to robustly reconstruct prior OD flows via self-supervised learning. Based on the day-to-day traffic flow characteristics, we developed an adaptive correction algorithm to dynamically adjust the objective surface during optimization. Thus, the local minima could be avoided and the final estimates could be obtained without any theoretical assumptions. The proposed methodology is comprehensively examined on an empirical dataset from an urban arterial, a simulation dataset from a regional network, and a synthetic large-scale grid network. The proposed method exhibited satisfactory estimation accuracy, robustness to several influencing factors, and good scalability.

In general, the main contributions of this paper are three-fold:

  • (1)

    A novel methodology for estimating day-to-day dynamic OD flow estimation fusing AVI observations and CV trajectories is proposed. Within this methodology, the characteristics of both data sources are effectively utilized.

  • (2)

    Linear projection is extended to deal with variations in CV penetration rates, and non-negative Tucker decomposition (NTD) is applied to impute sparsity values in projected OD flows. Reliable prior OD flows are provided through the two steps even under limited observing conditions.

  • (3)

    A self-supervised learning model called the latency-constrained autoencoder (LCAE) is established to search for the optimal solution based on the estimated prior. Meanwhile, an adaptive sub-sample correction (ASC) algorithm is proposed to incorporate day-to-day traffic flow characteristics to facilitate the optimization process.

Section snippets

Background and notations

Table 1 presents all the variables and notations used in this paper. Considering a road network G specified by link set A and OD pair set RS, and an analysis period consisting of multiple consecutive days represented by D, for each OD pair rsRS, a path set Krs exists; for each day dD, a number of identical time intervals are split and denoted by I. Here, a special note should be given to the day-to-day context of this paper, as the evolutionary dynamics is out of concern and the focus is

Evaluation metrics

In this study, the estimation performance is indicated by four error metrics—mean absolute error (MAE), root mean square error (RMSE), mean absolute percentage error (MAPE), and mean square percentage error (MSPE). MAE and MAPE are prevalent indicators of the estimation performances for various tasks. RMSE and MSPE are also included in this study because they can better reveal the estimation performance on larger values (larger OD flows tend to be more relevant). Corresponding formulas are

Conclusion and future work

In this paper, we developed a novel methodology for estimating the dynamic OD flows under day-to-day context based on the fusion of CV trajectories and AVI observations. This method requires neither any external or historical prior information nor assumptions on route choice behavior and dynamic network loading process, and thus, it could be recognized as a generalizable method. In this methodology, two remaining research problems are solved: obtaining reliable prior OD flows given limited

CRediT authorship contribution statement

Yumin Cao: Conceptualization, Methodology, Validation, Visualization, Writing - original draft. Keshuang Tang: Writing - review & editing, Supervision, Funding acquisition. Jian Sun: Writing - review & editing, Supervision, Funding acquisition. Yangbeibei Ji: Writing - review & editing, Supervision.

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Acknowledgement

This research is jointly sponsored by the National Key Research and Development Program of China (2018YFB16005), the National Natural Science Foundation of China (61673302, U1764261), and the Shanghai Science and Technology Commission Fund Project (19DZ1208800). The authors would also like to thank the constructive comments of the three anonymous reviewers. Any opinions, findings and conclusions are the responsibility of the authors alone.

References (65)

  • L. Lu et al.

    An enhanced SPSA algorithm for the calibration of Dynamic Traffic Assignment models

    Transport. Res. Part C: Emerg. Technol.

    (2015)
  • W. Ma et al.

    Estimating multi-class dynamic origin-destination demand through a forward-backward algorithm on computational graphs

    Transport. Res. Part C: Emerg. Technol.

    (2020)
  • W. Ma et al.

    Estimating multi-year 24/7 origin-destination demand using high-granular multi-source traffic data

    Transport. Res. Part C: Emerg. Technol.

    (2018)
  • W. Ma et al.

    Statistical inference of probabilistic origin-destination demand using day-to-day traffic data

    Transport. Res. Part C: Emerg. Technol.

    (2018)
  • M.J. Maher

    Inferences on trip matrices from observations on link volumes: a Bayesian statistical approach

    Transport. Res. Part B: Methodol.

    (1983)
  • V. Marzano et al.

    Limits and perspectives of effective O-D matrix correction using traffic counts

    Transport. Res. Part C: Emerg. Technol.

    (2009)
  • C. Osorio

    High-dimensional offline origin-destination (OD) demand calibration for stochastic traffic simulators of large-scale road networks

    Transport. Res. Part B: Methodol.

    (2019)
  • W. Rao et al.

    Origin-destination pattern estimation based on trajectory reconstruction using automatic license plate recognition data

    Transport. Res. Part C: Emerg. Technol.

    (2018)
  • H. Shao et al.

    Estimation of mean and covariance of peak hour origin–destination demands from day-to-day traffic counts

    Transport. Res. Part B: Methodol.

    (2014)
  • W. Song et al.

    Statistical metamodeling of dynamic network loading

    Transport. Res. Part B: Methodol.

    (2018)
  • H. Spiess

    A maximum likelihood model for estimating origin-destination matrices

    Transport. Res. Part B: Methodol.

    (1987)
  • H.J. Van Zuylen et al.

    The most likely trip matrix estimated from traffic counts

    Transport. Res. Part B: Methodol.

    (1980)
  • C. Wu et al.

    Cellpath: Fusion of cellular and traffic sensor data for route flow estimation via convex optimization

    Transport. Res. Part C: Emerg. Technol.

    (2015)
  • X. Wu et al.

    Hierarchical travel demand estimation using multiple data sources: A forward and backward propagation algorithmic framework on a layered computational graph

    Transport. Res. Part C: Emerg. Technol.

    (2018)
  • H. Yang et al.

    An analysis of the reliability of an origin-destination trip matrix estimated from traffic counts

    Transport. Res. Part B: Methodol.

    (1991)
  • H. Yang et al.

    Estimation of origin-destination matrices from link traffic counts on congested networks

    Transport. Res. Part B: Methodol.

    (1992)
  • J. Yang et al.

    Vehicle path reconstruction using automatic vehicle identification data: An integrated particle filter and path flow estimator

    Transport. Res. Part C: Emerg. Technol.

    (2015)
  • Y. Yang et al.

    Stochastic travel demand estimation: Improving network identifiability using multi-day observation sets

    Transport. Res. Part B: Methodol.

    (2018)
  • H. Zhang et al.

    Missing data detection and imputation for urban ANPR system using an iterative tensor decomposition approach

    Transport. Res. Part C: Emerg. Technol.

    (2019)
  • J. Zheng et al.

    Estimating traffic volumes for signalized intersections using connected vehicle data

    Transport. Res. Part C: Emerg. Technol.

    (2017)
  • T. Arsava et al.

    OD-NETBAND: an approach for origin-destination based network progression band optimization

    Transport. Res. Rec.: J. Transport. Res. Board

    (2018)
  • R. Ásmundsdóttir

    Dynamic OD Matrix Estimation using Floating Car Data (Msc)

    (2008)
  • Cited by (28)

    • Simulation-based dynamic origin–destination matrix estimation on freeways: A Bayesian optimization approach

      2023, Transportation Research Part E: Logistics and Transportation Review
    • Dynamic origin–destination flow estimation for urban road network solely using probe vehicle trajectory data

      2023, Journal of Intelligent Transportation Systems: Technology, Planning, and Operations
    • Signalized arterial origin-destination flow estimation using flawed vehicle trajectories: A self-supervised learning approach without ground truth

      2022, Transportation Research Part C: Emerging Technologies
      Citation Excerpt :

      However, computing efficiency and tractability are challenging due to a large number of unknown parameters. With the increasing data availability, data-driven OD flow estimation has been investigated to address this problem, such as license plate recognition (LPR) data (Castillo et al., 2013; Chiou et al., 2011; Sun et al., 2011, Mo et al., 2020), automatic vehicle identification (AVI) data (Van Der Zijpp, 1997; Asakura et al., 2000; Dixon and Rilett, 2002; Antoniou et al., 2004; Dixon and Rilett, 2005; Zhou and Mahmassani, 2006; Chen et al., 2011; Hadavi and Shafahi, 2016; Cao et al., 2021), cellphone data (Sohn and Kim, 2008; Iqbal et al., 2014), and probe vehicle data (Matsumoto et al., 2005; Yamamoto et al., 2009; Asmundsdottir et al., 2010; Baek et al., 2010; Cao et al., 2013; Yang et al., 2017). The basic idea of those methods is to boost estimation accuracy by supplementing information which is unavailable before.

    • Reliable location of automatic vehicle identification sensors to recognize origin-destination demands considering sensor failure

      2022, Transportation Research Part C: Emerging Technologies
      Citation Excerpt :

      Fu et al. (2019) investigated the traffic counting locations for minimizing the weighted maximum deviation of estimated mean and covariance of OD demands. Cao et al. (2021) proposes a novel method for estimating the dynamic OD demand based on the fusion of CV trajectories and AVI observations. This method consists of two models: the first provides reliable prior OD demands given limited observations, and the second seeks the optimal estimation based on the prior OD demands.

    View all citing articles on Scopus
    View full text