Adaptive network traffic control with an integrated model-based and data-driven approach and a decentralised solution method☆
Introduction
Effective traffic management plays an important role in sustaining urban development (Chow et al., 2014). In particular, adaptive traffic control systems have been shown to be able to improve efficiency and robustness of road networks with its capability of dynamically adjusting control plans with respect to the prevailing traffic conditions (Chow et al., 2020a). Classical examples in practice and academia include OPAC (Gartner, 1983), PRODYN (Henry et al., 1983), SCOOT (Hunt et al., 1982), SCATS (Luk, 1984), TUC (Diakaki et al., 2002), and the more recent max-pressure controller (Varaiya, 2013, Mercader et al., 2020). A comprehensive review of adaptive urban traffic control can also be found in Papageorgiou et al. (2003). An adaptive traffic control system can typically be formulated as a Markov decision process (MDP, (Bertsekas, 2019)) fed by real time observations and estimates of traffic flow. However, the sophistication of real-world network systems could hinder such optimal control system from fully operational in large scale applications in real time (Timotheou et al., 2015, Chow et al., 2020b, Chow et al., 2020c). In particular, as the size of the traffic network increases, the complexity of the corresponding optimal control problem will also grow exponentially with the number of decision variables to make, traffic state variables to observe, and demand uncertainties to consider. This issue is known as the curses of dimensionality, which is a major computational challenge faced in most dynamic optimisation problems (Powell, 2011, Bertsekas, 2019, Bertsekas, 2020).
One approach to address the aforementioned curses of dimensionality is to adopt the approximate dynamic programming (ADP) or reinforcement learning (RL) techniques (Powell, 2011, Sutton and Barto, 2018, Ying et al., 2020). The core idea of ADP or RL is to reduce the complexity of the original optimisation problem by approximating the relationship between the traffic control variables and the corresponding state variables and system performances in future times or stages with the use of approximators. Possible approximators can range from ordinary regression models (Cai et al., 2009) to artificial neural networks (ANNs) (Su et al., 2020). The approximators used under the ADP framework are typically parametric. The optimal controller hence will work with the parametric approximator for state estimation and solution seeking in the solution process, instead of searching directly through the state and solution space in the original problem settings (Powell, 2011, Bertsekas, 2019). In the literature, we see Cai et al. (2009) who first propose an ADP-based adaptive traffic signal controller for isolated junctions. Cai et al. (2009) show that the ADP-based controller could deliver effective traffic control policies with reduced computational effort. However, Cai et al. (2009) only consider isolated junctions with no consideration of coordination of network-wide traffic and spatial propagation of traffic queues, which raises questions on the generality and applicability of their findings. Ozan et al. (2015) present an application of reinforcement learning to network wide traffic signal control while their work is confined to static and cyclic timing plan. Li et al. (2016) and Wan and Hwang (2018) present a reinforcement learning based technique while their applications are for isolated junctions. Aslani et al. (2017) propose the use of an actor-critic reinforcement learning method to network traffic control with different disruption events. Liu et al. (2020) develop a switching-based optimal signal controller with a data-driven ADP and test it on small networks. With the cycle time constraint relaxed, the solution process could become intractable even for medium-sized networks. El-Tantawy et al. (2013) and Chu et al. (2020) present multi-agent adaptive traffic control systems using deep reinforcement learning techniques. Most of the control systems reviewed herein are pure data-driven (i.e. model-free). The lack of an underlying traffic model nevertheless poses concern on the tractability and stability of the overall control system. Moreover, the data-driven control systems often need to be trained with a vast amount of data or realisations before they can deliver satisfactory performance. Baldi et al. (2019) present a simulation-based traffic control system with use of a cyclic store-and-forward traffic model (Gazis, 2002) combined with ADP solution technique. Nevertheless, the optimal control framework of Baldi et al. (2019) is confined to cyclic timing plans which could lack flexibility and adaptiveness to prevailing traffic conditions (Lo et al., 2001, Lo and Chow, 2002).
Following the review of previous work, this paper presents an adaptive traffic controller for stochastic road networks with use of the ADP technique. Different from the pure data-driven approaches reviewed above, we adopt an integrated approach in which the proposed adaptive controller contains a model-based component for overall system stability and tractability. Specifically, the model-based optimisation component operates based upon an underlying kinematic wave model within a prediction horizon framework driven by stochastic demand (Sumalee et al., 2011, Mohajerpoor et al., 2019). The data-driven optimisation component operates based upon an ADP formulation which approximates the state-control interactions over future stages with a parametric approximator. The underlying parametric approximator is to be iteratively updated with online feeding of realisations of traffic states via a temporal difference (TD) learning process (Sutton and Barto, 2018, Bertsekas, 2019). The approximator reduces the computational complexity of the original problem by parameterising the state and decision space. As to be shown in our experiments, the incorporation of the model-based component facilitates the training of the ADP-based state approximator, and hence improve the overall performance of the control system. Existing studies mostly adopt a cyclic timing plan operations due to consideration of additional computational effort needed for more flexible plans (Lo and Chow, 2002). Exploiting the ADP framework, the proposed controller derives fully adaptive acyclic timing plan through a decentralised solution approach in which individual intersections are allowed to derive their own control policies in an asynchronous manner. In this decentralised setting, the data-driven ADP approximator would serve as a central agent coordinating the control policies derived at individual intersections in the network (Boyd et al., 2010). This ADP-based coordinator is shown to be able to improve and stabilise the performance of the overall control system even under congested conditions. This is a significant progress in adaptive control system design with use of decentralised optimisation technique (Chow and Sha, 2016, Chow et al., 2020c, Chow et al., 2020b). The present study contributes to the adaptive network traffic control with uncertainties through use of advanced optimisation techniques.
The rest of the paper consists of the following: Section 2 starts with presenting the kinematic wave network model with stochastic demand. Section 3 presents the adaptive signal control framework with integration of model-based optimisation and data-driven approximate dynamic programming approach. We refer this proposed integrated approach to the ‘MB + ADP’ controller throughout the paper. The MB + ADP control framework will also be associated with a decentralised solution procedure with which local control policies can be derived in an asynchronous manner over different intersections. Section 4 presents the numerical experiments where the results are discussed. Finally, Section 5 provides some concluding remarks.
Section snippets
Dynamic network traffic model with stochastic demand
A road network is considered to be a directed graph where denotes the set of nodes and is the set of links connecting the nodes in the network. Their corresponding cardinalities, i.e. and , are hence the total number of nodes and the total number of links in the network respectively. We further have a subset of representing the collection of all source links in the network through which a stochastic demand , is loaded into the network over time intervals (of
Adaptive traffic signal control
At each specific decision stage , we aim to seek a network-wide signal control policy that minimises the current and future total network delay driven by the system dynamics established in the previous section. The control policy is subject to the constraints set for each decision stage. The corresponding optimal control problem can be formulated as the following Markov decision process (Bertsekas, 2019):where is
Numerical experiments
The control algorithms are now applied to derive adaptive acyclic timing plans for traffic flow in different networks with stochastic demand. The derived control policies will be compared against the classical Webster cyclic fixed timing plan and the established distributed max-pressure control by Varaiya (2013) for benchmarking purpose. The Webster plan is commonly used in practice which determines the green split in each cycle at each signalised node based upon the prevailing degree of
Conclusions
This paper presents an integrated model-based and data-driven adaptive network traffic controller by using the technique of approximate dynamic programming (ADP). The proposed controller works with dynamic network traffic driven by stochastic demand, and derives corresponding acyclic timing plans aiming to minimise the network-wide traffic delay and the associated variability. In the proposed framework, the model-based component operates based on short-term state predictions derived from an
Acknowledgements
We would like to thank the anonymous reviewers for the constructive comments. This study is supported by a General Research Fund (11216819) awarded by the Hong Kong Research Grant Council, two Strategic Research Funds (7005091;7005544) provided by City University of Hong Kong, and a research grant (72071214) awarded by the National Nature Science Foundation of China.
References (55)
- et al.
Adaptive traffic signal control with actor-critic methods in a real-world traffic network with different traffic disruption events
Transp. Res. Part C
(2017) - et al.
Adaptive traffic signal control using approximate dynamic programming
Transp. Res. Part C
(2009) Optimisation of dynamic motorway traffic via a parsimonious and decentralised approach
Transp. Res. Part C
(2015)- et al.
Sensitivity analysis of signal control with physical queuing: delay derivatives and an application
Transp. Res. Part B
(2007) - et al.
Multi-objective optimal control formulations for bus service reliability with traffic signals
Transp. Res. Part B
(2017) - et al.
Centralised and decentralised signal timing optimisation approaches for network traffic control
Transp. Res. Part C
(2020) The cell transmission model: A dynamic representation of highway traffic consistent with the hydrodynamic theory
Transp. Res. Part B
(1994)The cell-transmission model, part II: network traffic
Transp. Res. Part B
(1995)- et al.
Multi-agent model predictive control of signaling split in urban traffic networks
Transp. Res. Part C
(2010) - et al.
A multivariable regulator approach to traffic-responsive network-wide signal control
Control Eng. Practice
(2002)
The prodyn real time traffic algorithm
IFAC Proc. Vol.
Decentralised signal control for urban road networks
Transp. Res. Part C
Max-pressure traffic controller based on travel times: An experimental analysis
Transp. Res. Part C
Analytical derivation of the optimal traffic signal timing: Minimizing delay variability and spillback probability for undersaturated intersections
Transp. Res. Part B
Analysis of fixed-time plan
Transp. Res. Part B
A modified reinforcement learning algorithm for solving coordinated signalized networks
Transp. Res. Part C
Modelling and analysis of traffic queue dynamics at signalized intersections
Transp. Res. Part A
Stochastic cell transmission model (sctm): A stochastic dynamic traffic model for traffic state surveillance and assignment
Transp. Res. Part B
A generic class of first order node models for dynamic macroscopic simulation of traffic flows
Transp. Res. Part B
Max pressure control of a network of signalized intersections
Transp. Res. Part C
An actor-critic deep reinforcement learning approach for metro train scheduling with rolling stock circulation under stochastic passenger demand
Transp. Res. Part B
A simulation-based traffic signal control for congested urban traffic networks
Transport. Sci.
Reinforcement Learning and Optimal Control
Rollout, Policy Iteration, and Distributed Reinforcement Learning
Distributed optimization and statistical learning via the alternative direction method of multipliers
Found. Trends Mach. Learn.
Dynamic system optimal traffic assignment — a state-dependent control theoretic approach
Transportmetrica
Performance analysis of centralised and distributed systems for urban traffic control
Transp. Res. Rec.
Cited by (22)
Delay-throughput tradeoffs for signalized networks with finite queue capacity
2024, Transportation Research Part B: MethodologicalAdaptive scheduling of mixed bus services with flexible fleet size assignment under demand uncertainty
2024, Transportation Research Part C: Emerging TechnologiesLeveraging reinforcement learning for dynamic traffic control: A survey and challenges for field implementation
2023, Communications in Transportation ResearchA perimeter control model of urban road network based on cooperative-noncooperative two-stage game
2023, Physica A: Statistical Mechanics and its ApplicationsHybrid perimeter control with real-time partitions in heterogeneous urban networks: An integration of deep learning and MPC
2023, Transportation Research Part C: Emerging TechnologiesAdaptive rail transit network operations with a rollout surrogate-approximate dynamic programming approach
2023, Transportation Research Part C: Emerging Technologies
- ☆
This article belongs to the Virtual Special Issue on IG005584: VSI:ISTTT24.