Adaptive network traffic control with an integrated model-based and data-driven approach and a decentralised solution method

https://doi.org/10.1016/j.trc.2021.103154Get rights and content

Abstract

This paper presents an adaptive traffic controller for stochastic road networks with an integrated model-based and data-driven solution framework. The model-based optimisation component operates based upon an underlying kinematic wave model driven by stochastic demand within a prediction horizon. The data-driven optimisation component operates based upon an approximate dynamic programming (ADP) formulation which approximates the state-control interactions over future stages with a parametric approximator. The approximator reduces the computational complexity of the adaptive control problem by parameterising the state and decision space. The parametric approximator is to be iteratively updated with online feeding of realisations of traffic states via a temporal difference (TD) learning process. Our results reveal that incorporation of the model-based component facilitates the training of the ADP-based state approximator, and hence improve the overall performance of the control system. We further develop a decentralised solution approach in which individual intersections are allowed to derive their own control policies in an asynchronous manner. The data-driven ADP approximator would serve as a central agent coordinating the control policies derived at individual intersections in the network. This is shown to be able to improve and stabilise the performance of the overall control system even under congested conditions. This is a significant progress in adaptive control system design with use of decentralised optimisation techniques. The present study contributes to the adaptive network traffic control with uncertainties through use of advanced modelling and optimisation methods.

Introduction

Effective traffic management plays an important role in sustaining urban development (Chow et al., 2014). In particular, adaptive traffic control systems have been shown to be able to improve efficiency and robustness of road networks with its capability of dynamically adjusting control plans with respect to the prevailing traffic conditions (Chow et al., 2020a). Classical examples in practice and academia include OPAC (Gartner, 1983), PRODYN (Henry et al., 1983), SCOOT (Hunt et al., 1982), SCATS (Luk, 1984), TUC (Diakaki et al., 2002), and the more recent max-pressure controller (Varaiya, 2013, Mercader et al., 2020). A comprehensive review of adaptive urban traffic control can also be found in Papageorgiou et al. (2003). An adaptive traffic control system can typically be formulated as a Markov decision process (MDP, (Bertsekas, 2019)) fed by real time observations and estimates of traffic flow. However, the sophistication of real-world network systems could hinder such optimal control system from fully operational in large scale applications in real time (Timotheou et al., 2015, Chow et al., 2020b, Chow et al., 2020c). In particular, as the size of the traffic network increases, the complexity of the corresponding optimal control problem will also grow exponentially with the number of decision variables to make, traffic state variables to observe, and demand uncertainties to consider. This issue is known as the curses of dimensionality, which is a major computational challenge faced in most dynamic optimisation problems (Powell, 2011, Bertsekas, 2019, Bertsekas, 2020).

One approach to address the aforementioned curses of dimensionality is to adopt the approximate dynamic programming (ADP) or reinforcement learning (RL) techniques (Powell, 2011, Sutton and Barto, 2018, Ying et al., 2020). The core idea of ADP or RL is to reduce the complexity of the original optimisation problem by approximating the relationship between the traffic control variables and the corresponding state variables and system performances in future times or stages with the use of approximators. Possible approximators can range from ordinary regression models (Cai et al., 2009) to artificial neural networks (ANNs) (Su et al., 2020). The approximators used under the ADP framework are typically parametric. The optimal controller hence will work with the parametric approximator for state estimation and solution seeking in the solution process, instead of searching directly through the state and solution space in the original problem settings (Powell, 2011, Bertsekas, 2019). In the literature, we see Cai et al. (2009) who first propose an ADP-based adaptive traffic signal controller for isolated junctions. Cai et al. (2009) show that the ADP-based controller could deliver effective traffic control policies with reduced computational effort. However, Cai et al. (2009) only consider isolated junctions with no consideration of coordination of network-wide traffic and spatial propagation of traffic queues, which raises questions on the generality and applicability of their findings. Ozan et al. (2015) present an application of reinforcement learning to network wide traffic signal control while their work is confined to static and cyclic timing plan. Li et al. (2016) and Wan and Hwang (2018) present a reinforcement learning based technique while their applications are for isolated junctions. Aslani et al. (2017) propose the use of an actor-critic reinforcement learning method to network traffic control with different disruption events. Liu et al. (2020) develop a switching-based optimal signal controller with a data-driven ADP and test it on small networks. With the cycle time constraint relaxed, the solution process could become intractable even for medium-sized networks. El-Tantawy et al. (2013) and Chu et al. (2020) present multi-agent adaptive traffic control systems using deep reinforcement learning techniques. Most of the control systems reviewed herein are pure data-driven (i.e. model-free). The lack of an underlying traffic model nevertheless poses concern on the tractability and stability of the overall control system. Moreover, the data-driven control systems often need to be trained with a vast amount of data or realisations before they can deliver satisfactory performance. Baldi et al. (2019) present a simulation-based traffic control system with use of a cyclic store-and-forward traffic model (Gazis, 2002) combined with ADP solution technique. Nevertheless, the optimal control framework of Baldi et al. (2019) is confined to cyclic timing plans which could lack flexibility and adaptiveness to prevailing traffic conditions (Lo et al., 2001, Lo and Chow, 2002).

Following the review of previous work, this paper presents an adaptive traffic controller for stochastic road networks with use of the ADP technique. Different from the pure data-driven approaches reviewed above, we adopt an integrated approach in which the proposed adaptive controller contains a model-based component for overall system stability and tractability. Specifically, the model-based optimisation component operates based upon an underlying kinematic wave model within a prediction horizon framework driven by stochastic demand (Sumalee et al., 2011, Mohajerpoor et al., 2019). The data-driven optimisation component operates based upon an ADP formulation which approximates the state-control interactions over future stages with a parametric approximator. The underlying parametric approximator is to be iteratively updated with online feeding of realisations of traffic states via a temporal difference (TD) learning process (Sutton and Barto, 2018, Bertsekas, 2019). The approximator reduces the computational complexity of the original problem by parameterising the state and decision space. As to be shown in our experiments, the incorporation of the model-based component facilitates the training of the ADP-based state approximator, and hence improve the overall performance of the control system. Existing studies mostly adopt a cyclic timing plan operations due to consideration of additional computational effort needed for more flexible plans (Lo and Chow, 2002). Exploiting the ADP framework, the proposed controller derives fully adaptive acyclic timing plan through a decentralised solution approach in which individual intersections are allowed to derive their own control policies in an asynchronous manner. In this decentralised setting, the data-driven ADP approximator would serve as a central agent coordinating the control policies derived at individual intersections in the network (Boyd et al., 2010). This ADP-based coordinator is shown to be able to improve and stabilise the performance of the overall control system even under congested conditions. This is a significant progress in adaptive control system design with use of decentralised optimisation technique (Chow and Sha, 2016, Chow et al., 2020c, Chow et al., 2020b). The present study contributes to the adaptive network traffic control with uncertainties through use of advanced optimisation techniques.

The rest of the paper consists of the following: Section 2 starts with presenting the kinematic wave network model with stochastic demand. Section 3 presents the adaptive signal control framework with integration of model-based optimisation and data-driven approximate dynamic programming approach. We refer this proposed integrated approach to the ‘MB + ADP’ controller throughout the paper. The MB + ADP control framework will also be associated with a decentralised solution procedure with which local control policies can be derived in an asynchronous manner over different intersections. Section 4 presents the numerical experiments where the results are discussed. Finally, Section 5 provides some concluding remarks.

Section snippets

Dynamic network traffic model with stochastic demand

A road network is considered to be a directed graph G=M,I where M denotes the set of nodes and I is the set of links connecting the nodes in the network. Their corresponding cardinalities, i.e. |M| and |I|, are hence the total number of nodes and the total number of links in the network respectively. We further have a subset ID of I representing the collection of all source links in the network G through which a stochastic demand Λt=Λit,iID, is loaded into the network over time intervals (of

Adaptive traffic signal control

At each specific decision stage k0, we aim to seek a network-wide signal control policy uk0 that minimises the current and future total network delay driven by the system dynamics xk+1~P(xk,uk|Λt) established in the previous section. The control policy uk0 is subject to the constraints set U for each decision stage. The corresponding optimal control problem can be formulated as the following Markov decision process (Bertsekas, 2019):minuk0UZxk0=Exk+1~Pxk,uk|ΛtlimKk=k0k0+Kγkdkwhere xk0 is

Numerical experiments

The control algorithms are now applied to derive adaptive acyclic timing plans for traffic flow in different networks with stochastic demand. The derived control policies will be compared against the classical Webster cyclic fixed timing plan and the established distributed max-pressure control by Varaiya (2013) for benchmarking purpose. The Webster plan is commonly used in practice which determines the green split in each cycle at each signalised node based upon the prevailing degree of

Conclusions

This paper presents an integrated model-based and data-driven adaptive network traffic controller by using the technique of approximate dynamic programming (ADP). The proposed controller works with dynamic network traffic driven by stochastic demand, and derives corresponding acyclic timing plans aiming to minimise the network-wide traffic delay and the associated variability. In the proposed framework, the model-based component operates based on short-term state predictions derived from an

Acknowledgements

We would like to thank the anonymous reviewers for the constructive comments. This study is supported by a General Research Fund (11216819) awarded by the Hong Kong Research Grant Council, two Strategic Research Funds (7005091;7005544) provided by City University of Hong Kong, and a research grant (72071214) awarded by the National Nature Science Foundation of China.

References (55)

  • J. Henry et al.

    The prodyn real time traffic algorithm

    IFAC Proc. Vol.

    (1983)
  • T. Le et al.

    Decentralised signal control for urban road networks

    Transp. Res. Part C

    (2015)
  • P. Mercader et al.

    Max-pressure traffic controller based on travel times: An experimental analysis

    Transp. Res. Part C

    (2020)
  • R. Mohajerpoor et al.

    Analytical derivation of the optimal traffic signal timing: Minimizing delay variability and spillback probability for undersaturated intersections

    Transp. Res. Part B

    (2019)
  • A. Muralidharan et al.

    Analysis of fixed-time plan

    Transp. Res. Part B

    (2015)
  • C. Ozan et al.

    A modified reinforcement learning algorithm for solving coordinated signalized networks

    Transp. Res. Part C

    (2015)
  • G. Stephanopoulos et al.

    Modelling and analysis of traffic queue dynamics at signalized intersections

    Transp. Res. Part A

    (1979)
  • A. Sumalee et al.

    Stochastic cell transmission model (sctm): A stochastic dynamic traffic model for traffic state surveillance and assignment

    Transp. Res. Part B

    (2011)
  • C.M.J. Tampère et al.

    A generic class of first order node models for dynamic macroscopic simulation of traffic flows

    Transp. Res. Part B

    (2011)
  • P. Varaiya

    Max pressure control of a network of signalized intersections

    Transp. Res. Part C

    (2013)
  • C. Ying et al.

    An actor-critic deep reinforcement learning approach for metro train scheduling with rolling stock circulation under stochastic passenger demand

    Transp. Res. Part B

    (2020)
  • S. Baldi et al.

    A simulation-based traffic signal control for congested urban traffic networks

    Transport. Sci.

    (2019)
  • D. Bertsekas

    Reinforcement Learning and Optimal Control

    (2019)
  • D. Bertsekas

    Rollout, Policy Iteration, and Distributed Reinforcement Learning

    (2020)
  • S. Boyd et al.

    Distributed optimization and statistical learning via the alternative direction method of multipliers

    Found. Trends Mach. Learn.

    (2010)
  • A.H.F. Chow

    Dynamic system optimal traffic assignment — a state-dependent control theoretic approach

    Transportmetrica

    (2009)
  • A.H.F. Chow et al.

    Performance analysis of centralised and distributed systems for urban traffic control

    Transp. Res. Rec.

    (2016)
  • Cited by (22)

    • Delay-throughput tradeoffs for signalized networks with finite queue capacity

      2024, Transportation Research Part B: Methodological
    View all citing articles on Scopus

    This article belongs to the Virtual Special Issue on IG005584: VSI:ISTTT24.

    View full text