Elsevier

Applied Soft Computing

Volume 96, November 2020, 106694
Applied Soft Computing

Real-time deep reinforcement learning based vehicle navigation

https://doi.org/10.1016/j.asoc.2020.106694Get rights and content

Highlights

  • A novel deep reinforcement learning method for real-time vehicle navigation.

  • Smart agents are embedded into SUMO simulator in a dynamic transportation system.

  • Performance is validated by nine realistic road and traffic combined conditions.

Abstract

Traffic congestion has become one of the most serious contemporary city issues as it leads to unnecessary high energy consumption, air pollution and extra traveling time. During the past decade, many optimization algorithms have been designed to achieve the optimal usage of existing roadway capacity in cities to leverage the problem. However, it is still a challenging task for the vehicles to interact with the complex city environment in a real time manner. In this paper, we propose a deep reinforcement learning (DRL) method to build a real-time intelligent vehicle routing and navigation system by formulating the task as a sequence of decisions. In addition, an integrated framework is provided to facilitate the intelligent vehicle navigation research by embedding smart agents into the SUMO simulator. Nine realistic traffic scenarios are simulated to test the proposed navigation method. The experimental results have demonstrated the efficient convergence of the vehicle navigation agents and their effectiveness to make optimal decisions under the volatile traffic conditions. The results also show that the proposed method provides a better navigation solution comparing to the benchmark routing optimization algorithms. The performance has been further validated by using the Wilcoxon test. It is found that the achieved improvement of our proposed method becomes more significant under the maps with more edges (roads) and more complicated traffics comparing to the state-of-the-art navigation methods.

Introduction

In recent years, traffic congestion in urban area has become a serious problem due to the rapid development of urbanization. It brings a major impact on urban transportation networks that lead to extra traveling hours, increased fuel consumption and air pollution. According to the study in [1], traffic congestion can be categorized into recurring congestion (RC) and non-recurring congestion (NRC). NRC is defined as the congestion made by unexpected events, such as construction work, inclement weather, accidents, and special events [2]. Unsurprisingly, NRC accounts for a larger proportion of traffic delays in urban areas comparing to the RC due to its unpredictable nature [3]. There are three categories of methods proposed to tackle the NRC problem: (1) detecting and predicting traffic congestions by utilizing both the historical and real-time sensor data [4], [5]; (2) optimizing traffic signal control and management [6], [7]; and (3) vehicle routing and navigation optimization [8], [9], [10], [11]. Wherein, the vehicle routing and navigation, as the most promising solution, has been investigated extensively during the past decades.

Classical vehicle routing problem (VRP) is defined as finding the minimum cost of the combined routes of a given number of vehicles m to serve n customers. One of the typical examples is path planning for collecting and sending packages from a delivery company. Traditional VRP is formed and classified as an NP-hard problem [9]. Many optimization algorithms are proposed to find the sub-optimal solution under different constraints, e.g. genetic algorithm [12], firefly algorithm [13], hybrid algorithm [14] or backbone algorithm [15]. However, this type of routing problem definition assumes a relatively stable traffic condition, and target on the travel salesman problem. Thus, these proposed optimization algorithms search for optimal solution under a static context. Different from the classical VRP, this study targets the vehicle routing and navigation problem as controlling the vehicle to make path planning from a given start node to a destination node with shortest travel time under dynamic traffic conditions [16]. The objective is to find a path or a set of optimal actions to minimize travel time with real-time collected input of current traffic conditions. In the past few years, many heuristic and evolutionary optimization algorithms are proposed to solve the problem [17], [18], [19], [20], [21]. Although recent research on vehicle routing and navigation has achieved reasonable results, the state-of-the-art methods suffer from several drawbacks: firstly, the shortest path type of methods, e.g., [16], [18], [21], becomes less efficient to provide optimal solutions due to unpredictable nature of more complicated traffic conditions and cannot react instantly to NRC issues. Secondly, it is hard to solve the problem in a real-time manner when using optimization algorithms, e.g., [20], [22], [23]. The searching space grows exponentially when more routing edges are added into the map. Thirdly, optimization based vehicle routing and navigation algorithms, such as [19], [24], [25], [26], cannot perform self-evolution and self-adaptation. To address the limitations of the methods, this paper proposes a deep reinforcement learning (DRL) method to achieve real-time intelligent vehicle navigation to alleviate the NRC issues.

By formulating vehicle routing and navigation task as a sequence of decisions, the DRL framework can be utilized to solve this automated control problem through taking real-time observations and evaluating the outcomes from actions under a complex environment. Inspired by recent success of deep reinforcement learning methods, the enhanced deep-Q network (DQN) method is adopted to deal with this real-world complexity given the fact that navigation task can be modeled as a Markov Decision Process (MDP). Comparing to the DQN in [27], our method enhances it in three aspects: (1) a suitable reward function is designed for the vehicle routing and navigation task; (2) several advanced schemes, including double DQN, dueling DQN and priority experience replay, are integrated into the framework to achieve a more reliable convergence of the network; and (3) a distance ranking sampling strategy is proposed to speed up the convergence speed.

In our work, a traffic simulator, SUMO, is seamlessly connected to smart navigation agents to provide an integrated experimental framework. It could simulate real-world traffic conditions as well as embed agent decisions into the traffic simulator. Once agents are required to make navigation decisions, observations generated from simulated traffic environment are fed into these agents as current traffic state representations. The agents could, therefore, make automated real-time navigation decisions that minimize the travel time to reach their destination. The proposed DRL framework provides an appealing and innovative solution for vehicle routing and navigation problem as the learning process is fully automated which does not require any labeling or guidance. Furthermore, the agents can automatically adapt their policy networks by analyzing global environment as their context and eventually decrease travel time when new data is generated.

The main contributions of our research can be highlighted as follows: (1) this work proposes a novel DRL algorithm to achieve an effective real-time vehicle routing and navigation system; (2) the DRL agents are embedded into traffic simulator SUMO to achieve an integrated framework to facilitate the intelligent vehicle navigation research under the context of a dynamic urban transportation system; and (3) the potential practical usage is demonstrated in nine realistic road and traffic combined conditions. Further, its efficiency is validated by using the Wilcoxon test.

The remainder of the paper is organized as follows: The background of our research is described in Section 2 to clarify the motivation of our work. Section 3 provides an overview of the proposed framework and introduces the main components in our system. Section 4 and Section 5 explain the detail of two main components, i.e. the traffic simulator and DRL smart agents. In Section 6, the experimental results are presented to demonstrate the convergence of the DRL agents, how the agents navigate vehicles to make routing choice and their performance compared to the SUMO build-in route optimization algorithms. Finally, Section 7 presents the conclusive remarks and suggests future work.

Section snippets

Background

The original idea for solving the problem of traffic congestion was via traffic control and optimization in which a significant number of research works had conducted. In particular, many of the works focus on path planning and directing vehicles to their destination as soon as possible with considerations of static conditions, e.g., travel distance and speed limit.

The very first solution in early years for vehicle navigation problems was the shortest path algorithm, which aims to find a path

The framework

Our framework encapsulates the use of SUMO (Simulation of Urban Mobility) with Traffic Control Interface (TraCI) that allows us to connect with the DRL agent. In this study, an improved Deep Q-Learning Network (DQN) method [27] is adopted to train intelligent agents to navigate the vehicles to their destinations and avoid congestions. As shown in Fig. 1, the designed framework consists of three parts: The first part is the SUMO, which is the environment simulator for creating realistic traffic

Traffic simulator

Simulation is considered an efficient approach in investigating different scientific problems. Facilitating the increasing processing power possessed by computers, simulation allows testing the complex scientific models in a reasonable time with minimum cost. Traffic simulator is especially widely used in transportation research as running experiments with vehicles in the real world is simply not practical [42]. There are several widely used traffic simulators, including Quadstone Paramics [43]

Problem statement

In our work, the graph theory is used to represent the traffic network in the SUMO simulator. The traffic network is represented as G = {N,E}, where N represents junctions and E represents roads in the map. Each intersection in the network between roads is a node in our graph and an edge is defined if there is a road segment that connects the two corresponding intersections. As shown in Fig. 3, the left sub-figure is a normal urban road traffic network that runs in SUMO simulator and the right

Experiment preparation

This section presents the preparation of the experiment for implementing our proposed method. It includes simulation environment building, demand traffic generation and smart agent training process.

Experiment evaluation

There are two subsections in the experimental evaluation: Firstly, two toy data maps are generated for testing the convergence of intelligent navigation agents. Additionally, the toy data simulation can further provide a tool to gain insight of the decisions made by the intelligent agent during the navigation. Secondly, nine traffic conditions based on three regions in Liverpool city center are simulated to demonstrate the efficiency of the DRLs.

To further demonstrate the performance of the

Discussion and conclusion

Traffic congestion is a major contemporary issue in many densely populated cities. Thus, during the past several decades, many vehicle navigation systems are designed for vehicles to reach their destinations as quickly as possible when traffic is busy. However, it is not a trivial task to find an optimal solution under a complex city environment. In this paper, a novel DRL based vehicle routing optimization method is proposed to re-route vehicles to their destinations in complex urban

CRediT authorship contribution statement

Songsang Koh: Conceptualization, Methodology, Software, Validation, Formal analysis, Investigation, Data curation, Writing - original draft, Writing - review & editing, Visualization. Bo Zhou: Conceptualization, Methodology, Software, Investigation, Resources, Writing - original draft, Writing - review & editing, Supervision, Project administration, Funding acquisition. Hui Fang: Conceptualization, Methodology, Writing - original draft, Writing - review & editing, Supervision, Project

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

References (52)

  • SunF. et al.

    DxNAT—Deep neural networks for explaining non-recurring traffic congestion

  • N. Zygouras, N. Panagiotou, N. Zacheilas, I. Boutsis, V. Kalogeraki, I. Katakis, D. Gunopulos, Towards detection of...
  • GhafouriA. et al.

    Optimal detection of faulty traffic sensors used in route planning

  • MousaviS.S. et al.

    Traffic light control using deep policy-gradient and value-function-based reinforcement learning

    IET Intell. Transp. Syst.

    (2017)
  • RitzingerU. et al.

    A survey on dynamic and stochastic vehicle routing problems

    Int. J. Prod. Res.

    (2016)
  • JabbarpourM.R. et al.

    Applications of computational intelligence in vehicle traffic congestion problem: a survey

    Soft Comput.

    (2018)
  • HosseinabadiA.A.R. et al.

    An ameliorative hybrid algorithm for solving the capacitated vehicle routing problem

    IEEE Access

    (2019)
  • BertsimasD. et al.

    Online vehicle routing: The edge of optimization in large-scale applications

    Oper. Res.

    (2019)
  • GawronC.

    Simulation-Based Traffic Assignment. Computing User Equilibria in Large Street Networks

    (1998)
  • LanningD.R. et al.

    Dijkstra’s algorithm and google maps

  • NaharS.A.A. et al.

    Modelling and analysis of an efficient traffic network using ant colony optimization algorithm

  • ZongX. et al.

    Multi-ant colony system for evacuation routing problem with mixed traffic flow

  • KapariasI. et al.

    A reliability-based dynamic re-routing algorithm for in-vehicle navigation

  • AnjumS.S. et al.

    Modeling traffic congestion based on air quality for greener environment: an empirical study

    IEEE Access

    (2019)
  • R. Dean, B. Nagy, A. Stentz, B. Bavar, X. Zhang, A. Panzica, Autonomous vehicle routing using annotated maps, 2019,...
  • P.V. Boesen, Vehicle with interaction between vehicle navigation system and wearable devices, 2017, Google Patents, US...
  • Cited by (81)

    View all citing articles on Scopus
    View full text