Deep attention models with dimension-reduction and gate mechanisms for solving practical time-dependent vehicle routing problems

https://doi.org/10.1016/j.tre.2023.103095Get rights and content

Highlights

  • First work addressing practical VRPs with frequently changing travel times.

  • First work developing DRL models to tackle realistic time-dependent VRPs.

  • We develop multiple improvement mechanisms to improve existing DRL models.

  • Our models outperform two representative heuristics and two novel DRL models.

Abstract

Time dependencies of travel speeds in time-dependent vehicle routing problems (TDVRPs) are usually accounted for by discretizing the planning horizon into several time periods. However, travel speeds usually change frequently in real road networks, so many time periods are needed to evaluate candidate solutions accurately in model and solution construction for practical TDVRPs, which increases substantially the computational complexity of TDVRPs. We develop two deep attention models with dimension-reduction and gate mechanisms to solve practical TDVRPs in real urban road networks. In the two models, a multi-head attention-based dimension-reduction mechanism is proposed to reduce the dimension of model inputs and obtain enhanced node representation, whereas a gate mechanism is introduced to obtain better information representation. On the basis of a travel speed dataset from an urban road network, we conduct extensive experiments to validate the effectiveness of the proposed models on practical TDVRPs with or without consideration of time windows. Experimental results show that our models can solve TDVRPs with 240 time periods and up to 250 customers effectively and efficiently and provide significantly superior overall performances over two representative heuristics and two state-of-the-art deep reinforcement learning models. Especially, compared with a recent tabu search method, our models can reduce the computation time by up to 3,540 times and improve the solution performance by up to 23%. Moreover, our models have an outstanding generalization performance. The model trained for the 30-customer TDVRP with time windows can be used directly to solve problems with up to 250 customers effectively by generating superior solutions over those generated by benchmarking methods.

Introduction

The vehicle routing problem (VRP) is a classical combinatorial optimization problem and one of the most widely investigated problems in transportation science and logistics. It aims to determine the set of routes for a fleet of vehicles to serve a given set of customers in a road network so that one or more objectives can be optimized without violating constraints imposed. Travel speeds in real-world road network are time-dependent and frequently changing in different time periods. VRPs with time-dependent travel speeds (time) are called time-dependent VRPs (TDVRPs). Depending on whether or not customer time windows in which the visits must occur are considered, the problems are classified into TDVRPs with and without time windows.

Techniques for solving vehicle routing problems (VRPs) can be classified roughly into exact techniques (Dabia et al., 2013, Lera-Romero et al., 2020, Rostami et al., 2021, Huang et al., 2021), heuristics (Malandraki and Daskin, 1992, Donati et al., 2008, Huart et al., 2016, Gmira et al., 2021, Tang et al., 2021), and deep learning-based techniques (Kool et al., 2019, Nazari et al., 2018, Lu et al., 2020). Exact techniques can, in principle, obtain optimal solutions to VRPs. In recent years, some researchers have developed branch-price-and-cut algorithms to solve optimally VRPs, including a two-echelon capacitated VRP with time windows and up to 100 customers (Mhamedi et al., 2021), a robust capacitated VRP with up to 50 customers (Heßler, 2021), a VRP with stochastic travel times and up to 75 customers (Rostami et al., 2021), and a TDVRP with time window assignment, and up to 25 customers and 5 periods (Spliet et al., 2018). Some researchers have developed branch-and-price algorithms to handle various VRPs with up to 75 customers and 7 periods (Zhang et al., 2021a, Vidal et al., 2021, Ben Ticha et al., 2021, Wölck and Meisel, 2022). However, exact techniques are usually problem-dependent and it is difficult (if not impossible) to generate quickly effective solutions to large-scale practical TDVRPs since the (worst-case) computational complexity of exact techniques is exponential.

Compared with exact techniques, heuristic techniques can obtain feasible solutions quickly to various TDVRPs.  Gmira et al. (2021) developed a tabu search algorithm to solve a TDVRP with time windows, up to 200 nodes, 50 customers, and 5 periods.  Pan et al. (2021) proposed a hybrid algorithm combining adaptive large neighborhood search and tabu search to solve a duration-minimizing TDVRP with up to 100 customers and 7 periods.  Rincon-Garcia et al. (2020) proposed a large neighborhood search algorithm to solve a TDVRP with time windows, which considered a maximum of 100 customers and 5 periods. However, heuristic techniques are prone to getting stuck in local optima and have difficulties in providing effective solutions to practical TDVRPs within a reasonable computation time. Moreover, previous TDVRP studies usually consider a maximum of 100 customers, a maximum of 200 road nodes, and a maximum of 7 time periods, in which each period could last several hours. Unfortunately, practical VRPs in real urban road networks often need to serve more customers and have to consider much shorter time periods. Travel speeds in real-world urban road networks usually change frequently due to traffic signals and unpredictable traffic flows. Using more reliable time-dependent travel speeds is helpful to improve the accuracy of candidate solution evaluations, which is thus critical to construct effective solution models for VRPs and generate reliable final routing solutions. However, considering more customers and realistic time-dependent travel speeds further increase the computational complexity of VRPs (Rincon-Garcia et al., 2020), and the research on large-scale TDVRPs with frequently changing travel speeds in real road networks is still open.

In recent years, deep learning, reinforcement learning, and deep reinforcement learning (DRL) techniques, originally developed in the artificial intelligence community, have attracted increased attention from the optimization community (Vinyals et al., 2015, Bello et al., 2016, Liu et al., 2022, Basso et al., 2022, Du et al., 2021, Mao et al., 2020).  Yan et al. (2022) have provided a comprehensive review on reinforcement learning for various decision-making problems in logistics and supply chain management. A pioneering work by  Vinyals et al. (2015) developed the Pointer Networks (PtrNets) with long short-term memory (LSTM) networks (Hochreiter and Schmidhuber, 1997) to solve multiple combinatorial optimization problems based on an encoder–decoder framework (Sutskever et al., 2014). The PtrNets were trained offline by supervised learning from a large number of problem instances with solutions, and their network performances were related highly to the quality of the solutions. However, obtaining sufficient combinatorial optimization problem instances with high-quality solutions might be infeasible, or computationally expensive. To overcome these limitations,  Bello et al. (2016) extended the PtrNets by using the REINFORCE algorithm (Williams, 1992) to train PtrNets without supervised problem instances. This is the first case of using DRL to handle combinatorial optimization problems. These techniques can find effective or near-optimal solutions very efficiently after using neural networks to directly and quickly learn heuristics from data without any hand-engineered reasoning (Bengio et al., 2021).

There has been an increasing interest in using DRL to tackle VRPs. Some DRL models work like improvement heuristics, which iteratively improve the solutions to capacitated VRPs based on a given initial solution by integrating DRL with heuristics (Lu et al., 2020, Chen and Tian, 2019, Gao et al., 2020). Other DRL models work like constructive heuristics, which construct each vehicle’s route by starting with an empty route and adding new customer nodes to visit in turn until a complete route is formed. In these models, the new node is chosen based on the attention values (i.e., selection probability) of available nodes during node decoding.  Nazari et al. (2018) used element-wise projections to encode nodes in PtrNets to solve a capacitated VRP. These DRL models (Nazari et al., 2018, Bello et al., 2016, Chen and Tian, 2019) were developed by using LSTM networks as an encoder and (or) decoder. However, the inherently sequential nature of LSTM networks precludes parallelization of problem instances-based model training, which leads to relatively low model performance for sequential decision problems, such as vehicle routing problems (Li et al., 2022) and machine translation (Vaswani et al., 2017). Compared with previous LSTM-based models, the transformer model developed by  Vaswani et al. (2017) uses a multi-head attention (MHA) mechanism to achieve a very promising solution performance for sequential decision problems because the MHA mechanism allows for much more parallelization and is able to provide better information extraction capabilities. Following this model,  Kool et al. (2019) proposed the Attention Model (AM) to handle various routing problems, which is developed based on the transormer model and exhibited superior performances over several benchmarking models (e.g., OR-Tools, PtrNets) on extensive routing instances.  Duan et al. (2020) proposed a graph convolution network-based DRL model to effectively solve a practical capacitated VRP in a real road network. In these studies, a DRL model forms a solution by constructing different vehicle routes sequentially. That is, the construction of a vehicle’s route cannot be started until its previous vehicle’s route is formed completely in solution construction. Following the AM (Kool et al., 2019),  Bono et al. (2020) developed a Multi-Agent Routing model using Deep Attention Mechanisms (MARDAM) to solve a dynamic capacitated VRP with stochastic customers, which integrates fleet state and fleet state representation modules into the AM and generates each solution by constructing multiple vehicle routes in parallel. Their results showed that the MARDAM was superior over the AM for VRPs with time windows, but inferior to the latter for VRPs without time windows, in terms of optimization ability. Moreover, previous DRL-based VRP studies were conducted seldom based on real road networks and have not considered frequently changing travel speeds. It is thus worthy to explore effective and efficient DRL models for practical TDVRPs in real road networks.

This paper investigates practical large-scale TDVRPs, both with and without time windows, based on a real travel speed dataset from an urban road network of Chengdu city, China. We use a link travel speed dataset with 240 2-minute time periods to represent the time-dependency of travel speeds in this road network.

To solve the investigated TDVRPs, we develop two deep attention models with two novel model improvement mechanisms (i.e., dimension-reduction and gate mechanisms, DRGM), including an attention model (Kool et al., 2019) with DRGM (AM-DRGM) and a MARDAM (Bono et al., 2020) with DRGM (MARDAM-DRGM). The dimension-reduction mechanism is used to reduce the dimension of high-dimensional input data for reducing the use of computational resources. The gate mechanism used in deep neural networks was derived from the gate control theory of pain (Ronald and Wall Patrick, 1965), which states that the non-painful input closes the “gates” to the painful input and prevents pain sensation from traveling to the central nervous system. Following this theory, the concept of gate mechanism was introduced into recurrent neural networks (e.g., LSTM Hochreiter and Schmidhuber, 1997), which is a mechanism to control the flow of information by blocking out information deemed useless or receiving useful information. This paper uses the gate mechanism to extract useful information from the local traffic information of vehicles, the global traffic information of the road network, and other relevant information. We then aggregate this information as the input of the next layer connected to the current gate layer. The local traffic information depicts road traffic status nearby a vehicle, which is represented by travel times from the vehicle (a certain node) to all other nodes in the road network at a time step. Compared with the local traffic information for a single vehicle, the global traffic information is composed of all travel times of traveling between all node pairs in the road network at a certain time step. The local traffic information enables a vehicle to make local-optimal node selections while the global information at each time step can provide dynamic and valuable references for making global-optimal decisions.

The novelty of both models consists of a novel input construction method and two model improvement mechanisms. First, instead of using 3-dimensional model inputs (detailed in Section 3.3) as in previous studies (Nazari et al., 2018, Kool et al., 2019, Bono et al., 2020), our input construction method constructs 4-dimensional model inputs with travel time information so that more comprehensive road network and node features can be contained in model inputs. Second, to handle the large increase of computational resource uses (i.e., computation time and memory) resulted by the 4-dimensional model inputs, different from two previous DRL models (i.e., AM, MARDAM), our DRL models introduced a novel dimension-reducing mechanism (dimension-reducing multi-head attention, DR-MHA for short) to replace the first MHA layer in the encoder. By so doing, the 4-dimensional model inputs were converted into 3-dimensional node representations and the travel time information in the road network could be extracted effectively with much less computation resources. Third, a gate mechanism used by  Parisotto et al. (2020) is introduced to our models to obtain a better context representation, which is the first to introduce a gate layer into DRL models for VRPs. We test our two deep attention models on problem instance sets with different number of customers and the results show that the proposed models achieve a better optimization performance than two representative heuristics and two state-of-the-art benchmarking models (the AM and the MARDAM).

The main contributions of this paper are as follows.

  • This paper is the first to address practical large-scale TDVRPs with frequently changing travel times in real road network.

  • This paper is the first to develop effective and efficient DRL models to learn problem-solving heuristics for TDVRPs, which presents the large superiority of DRL models over benchmarking methods in terms of solution quality, computation time, and generalization performance.

  • We propose a novel input construction method and two novel model improvement mechanisms for existing DRL models to improve effectively the performance of DRL models for TDVRPs. Two resulting DRL models, so-called deep attention models (i.e., AM-DRGM and MARDM-DRGM), can generate clearly superior overall performances for TDVRPs without and with time windows, respectively, over two representative heuristics and two state-of-the-art DRL models (the AM and the MARDAM).

The remainder of the paper is organized as follows. Section 2 introduces the mathematical formulation of practical TDVRP and then reformulates it as a Markov Decision Process (MDP). We then elaborate the two DRL models proposed in Section 3. Next, Section 4 describes the experimental settings and the generation method of problem instances. In Section 5, we present the computational experiments and analysis. Finally, we conclude the paper and discuss future research directions in Section 6.

Section snippets

Problem formulation

This section introduces a mathematical formulation of practical TDVRPs firstly. Then, we reformulate the solution construction process of TDVRPs as an MDP.

Two deep attention models with dimension-reduction and gate mechanisms

Both the AM and the MARDAM are deep attention models in the DRL area, which follow an encoder–decoder framework with the MHA layer introduced by  Vaswani et al. (2017). The MHA layer is used as a trainable aggregation function for better feature extraction. They are designed for fully connected road networks in which all node pairs are connected. However, any road nodes in a real road network are only connected with several neighboring nodes. We thus convert the real road network to a fully

Experimental setting

Our models are programmed with Pytorch 1.8.1. Pytorch is a flexible and modular open source framework that accelerates the path from machine learning research prototyping to production deployment. All experiments are executed on a server with Intel Xeon Platinum 8260 CPU and NVIDIA RTX A6000 GPU. The codes of our models are available on GitHub.1

Training performances and convergence

Fig. 3 shows the change trajectories of mean objective values in training and validation of our two deep attention models for two TDVRPs with 50 customers over training epochs. The mean objective values in training and validation are obtained by averaging the objective values of 1280,000 problem instances for training and 10,240 problem instances for validation, respectively. Fig. 3(a) shows the results of the AM-DRGM for the TDVRP without time windows, while Fig. 3(b) shows the results of the

Conclusion

To tackle practical TDVRPs with frequently changing travel times in real road networks, this paper developed two novel encoder-decoder based deep attention models (i.e. AM-DRGM and MARDARM-DRGM) by introducing a dimension-reducting mechanism and a gate mechanism into two state-of-the-art DRL models (Kool et al., 2019, Bono et al., 2020) for VRPs. We use the encoder to extract travel time information between nodes and construct a better context embedding in the decoder, so as to obtain better

CRediT authorship contribution statement

Feng Guo: Conceptualization, Methodology, Validation, Writing – original draft, Writing – review & editing. Qu Wei: Methodology, Validation, Writing – review & editing. Miao Wang: Validation. Zhaoxia Guo: Conceptualization, Methodology, Writing – original draft, Writing – review & editing, Supervision. Stein W. Wallace: Writing – review & editing.

Acknowledgments

Zhaoxia Guo would like to thank the financial supports from the National Natural Science Foundation of China (Grant No. 71872118 and 72171159), and Feng Guo thanks for the financial support from the China Scholarship Council (Grant No. 202106240119).

References (49)

  • MontoyaA. et al.

    The electric vehicle routing problem with nonlinear charging function

    Transp. Res. B

    (2017)
  • PanB. et al.

    A hybrid algorithm for time-dependent vehicle routing problem with time windows

    Comput. Oper. Res.

    (2021)
  • Rincon-GarciaN. et al.

    A metaheuristic for the time-dependent vehicle routing problem considering driving hours regulations – An application in city logistics

    Transp. Res. C

    (2020)
  • YanY. et al.

    Reinforcement learning for logistics and supply chain management: Methodologies, state of the art, and future opportunities

    Transp. Res. E

    (2022)
  • YuY. et al.

    A branch-and-price algorithm for the heterogeneous fleet green vehicle routing problem with time windows

    Transp. Res. B

    (2019)
  • ZhangD. et al.

    On scenario construction for stochastic shortest path problems in real road networks

    Transp. Res. E

    (2021)
  • BaldacciR. et al.

    Valid inequalities for the fleet size and mix vehicle routing problem with fixed costs

    Networks

    (2009)
  • BelloI. et al.

    Neural combinatorial optimization with reinforcement learning

    (2016)
  • Ben TichaH. et al.

    The time-dependent vehicle routing problem with time windows and road-network information

    Oper. Res. Forum

    (2021)
  • BonoG. et al.

    Solving multi-agent routing problems using deep attention mechanisms

    IEEE Trans. Intell. Transp. Syst.

    (2020)
  • Chen, X., Tian, Y., 2019. Learning to perform local rewriting for combinatorial optimization. In: Proc. 33rd Internat....
  • DabiaS. et al.

    Branch and price for the time-dependent vehicle routing problem with time windows

    Transp. Sci.

    (2013)
  • Duan, L., Zhan, Y., Hu, H., Gong, Y., Wei, J., Zhang, X., Xu, Y., 2020. Efficiently solving the practical vehicle...
  • GaoL. et al.

    Learn to design the heuristics for vehicle routing problem

    (2020)
  • Cited by (2)

    View full text