Personalized route recommendation for ride-hailing with deep inverse reinforcement learning and real-time traffic conditions

https://doi.org/10.1016/j.tre.2022.102780Get rights and content

Highlights

  • We extend inverse reinforcement learning to consider real-time traffic conditions.

  • We propose a new method to calculate the expected state visitation frequency.

  • Numerical experiments on real ride-hailing trajectories validate our model.

Abstract

Personalized route recommendation aims to recommend routes based on users’ route preference. The vast amount of GPS trajectories tracking driving behavior has made deep learning, especially inverse reinforcement learning (IRL), a popular choice for personalized route recommendation. However, current IRL studies assume that the traffic condition is static and approximate the expected state visitation frequencies to update the neural network. This study improves the IRL to recommend personalized routes considering real-time traffic conditions. We also improve the expected state visitation frequency calculation based on characteristics of ride-hailing and taxi trajectories to calculate the gradient of the neural network. In addition, the graph attention network is employed to capture the spatial dependencies between road segments. Numerical experiments using real ride-hailing trajectories in Chengdu, China validate our model. At last, a statistical test is conducted, and route preferences reflected by the same driver’s empty trajectories and occupied trajectories are found to have significant differences.

Introduction

Ride-hailing (e.g., Didi Chuxing) services play an important role in urban transportation and have gained much attention in China. As reported by China Daily (2021a), Chinese ride-hailing service has boomed in recent years, serving around 21 million trips per day. Behind the popularity of online ride-hailing, many drivers do not follow the route planned by the ride-hailing platform and induce social problems. For example, on February 6, 2021, a woman in Changsha, China found the ride-hailing driver deviated from the planned route. She was afraid and jumped out of the car and died (China Daily, 2021b). Similarly, a woman feared that the driver did not follow the navigation route, ejected herself from the car, and fractured her arm in Hangzhou, China on June 12, 2021 (Global Times, 2021). These incidents illustrate the serious consequences of ride-hailing drivers not following the planned route of online ride-hailing platforms. An important reason for drivers to deviate is that the planned route does not conform to personal route preferences (People’s Daily Online, 2021). Therefore, it is meaningful for ride-hailing platforms to consider drivers’ route preferences when planning routes.

In recent years, there has been a growing interest in recommending routes based on users’ preferences. Literature in this field can be broadly classified into five streams. The first stream directly asks the driver’s route preferences (Nadi and Delavar, 2011, Pahlavani and Delavar, 2014, Zeng et al., 2015, Campigotto et al., 2016, Torres et al., 2018, Li et al., 2022). However, it is hard for drivers to describe their preferences precisely. The second stream builds an experience-based graph from historical trajectories to find the route with the lowest cost (Dai et al., 2015, He et al., 2017, Yang et al., 2017, Neto et al., 2018). Although it selects similar historical trajectories considering the user’s preference, it cannot analyze the user’s preference for each road segment. The third stream employs discrete choice models to learn drivers’ preferences (Menghini et al., 2010, Broach et al., 2012, Fosgerau et al., 2013, Mai et al., 2015, Zimmermann et al., 2017, Lu et al., 2017). The fourth stream generates the transition probability matrix using historical trajectories and recommends the route with the highest probability (Chen et al., 2011a, Chen et al., 2011b, Cui et al., 2018). However, it ignores the sequence of travel behavior (Cui et al., 2018). The fifth stream utilizes deep learning (Delling et al., 2015, Wang et al., 2019, Li et al., 2020), especially the IRL (Ziebart et al., 2008, Wulfmeier et al., 2015, Liu et al., 2020) for personalized route recommendation. The IRL is the problem of learning the reward function of an agent underlying a Markov decision process from its observed behavior (Arora and Doshi, 2021). We notice that existing IRL studies assume the traffic condition is static and ignore the impact of real-time traffic conditions when planning personalized routes (Wulfmeier et al., 2015, Nguyen et al., 2015, Wulfmeier et al., 2016, Wulfmeier et al., 2017, Liu et al., 2020). In addition, it is difficult to accurately compute the expected state visitation frequency when the state space is large (Audiffren et al., 2015, Oh and Iyengar, 2019, Imani and Ghoreishi, 2021).

To address these issues, we extend the deep IRL to the dynamic traffic environment and propose a method for computing expected state visitation frequency. The input of our model includes the road segment encoding and real-time speed, and the output is the real-time reward. The reward for passing through each road segment is not fixed, but varies with the real-time traffic conditions at the departure time. The training framework of our model consists of three main phases: (1) estimate the underlying reward function; (2) approximate the expected state visitation frequencies based on the current reward function, historical routes, and traffic speed information; (3) compute the gradient and update the neural network. These three phases are executed sequentially in each iteration. After recovering the reward function through training, we recommend personal routes based on the reward function and real-time traffic conditions at the departure time. Furthermore, we verify whether the models trained with occupied and empty trajectories differ significantly in personalized route recommendation.

The contributions of this research are summarized as follows:

  • We extend the inverse reinforcement learning to combine the real-time traffic condition at the departure time with personalized route preference for route recommendation;

  • This study proposes a method to calculate the expected state visitation frequency more accurately based on characteristics of ride-hailing and taxi trajectories;

  • Numerical experiments conducted on real ride-hailing trajectories show that our model outperforms state-of-the-art IRL studies. We also find that route preferences reflected by the same driver’s empty trajectories and occupied trajectories are significantly different.

The remainder of this paper is organized as follows. In Section 2, we present the related work on personalized route recommendation, inverse reinforcement learning, graph neural network, and personalized itinerary recommendation. In Section 3, we introduce our improved deep inverse reinforcement learning model in detail. In Section 4, we conduct experiments to verify our model using the real ride-hailing trajectories in Chengdu, China and analyze the difference in route preference reflected by empty trajectories and occupied trajectories. Finally, we present a conclusion and discuss future research directions in Section 5.

Section snippets

Personalized route recommendation

Studies on personalized route recommendation can be broadly divided into five approaches. The first approach acquires the driver’s route preferences by communicating with drivers and recommends the route accordingly. Nadi and Delavar (2011) ask drivers to define the relative importance of the attributes (e.g., travel distance, road width) by integrating a pairwise comparison method and ordered weighted averaging operators. Pahlavani and Delavar (2014) receive the driver’s route preferences such

Preliminaries

Definition 1 Road Network

A road network is modeled as a directed graph G=(V,E), where V and E represent the set of nodes and edges, respectively. A node represents a crossroad or a road start/end. An edge represents a directed road segment with a specific speed, and its weight is the opposite number of its reward. The adjacency between road segments is defined in terms of spatial location.

Definition 2 GPS Trajectory

A GPS trajectory τ is a sequence of time-ordered GPS points generated by a vehicle, i.e., τ=((q1,t1)(qi,ti)(qm,tm)), where qi is

Experimental setup

To evaluate the performance of our proposed method in personalized route recommendation of ride-hailing and taxis, we use the real-world trajectory dataset provided by Didi Chuxing GAIA Initiative (https://gaia.didichuxing.com). This dataset includes over 14.55 million ride-hailing trajectories in Chengdu, China, and spans two months from October 1st to November 30th, 2018. The study area is in the central district of Chengdu, China, which ranges from 30.65N to 30.73N and from 104.04E to 104.

Conclusions and future research directions

This study aims to improve inverse reinforcement learning for personalized route recommendation of ride-hailing considering the impact of real-time traffic conditions. In our model, we assume that the driver’s reward for passing through each road segment is not only related to the driver’s route preference, but also related to the real-time traffic conditions. Our proposed model takes road segment encoding and real-time speed as inputs, and leverages deep neural networks to capture complex

CRediT authorship contribution statement

Shan Liu: Conceptualization, Methodology, Investigation, Formal analysis, Writing – original draft, Software, Visualization, Writing – review & editing. Hai Jiang: Supervision, Project administration, Writing – review & editing, Funding acquisition.

Acknowledgments

This research is supported by the National Natural Science Foundation of China [Grant 71761137003]. We would like to thank Didi Chuxing GAIA Initiative for providing the ride-hailing trajectory data and traffic speed data.

References (74)

  • LiuS. et al.

    Integrating Dijkstra’s algorithm into deep inverse reinforcement learning for food delivery route planning

    Transp. Res. E

    (2020)
  • MaiT. et al.

    A nested recursive logit model for route choice analysis

    Transp. Res. B

    (2015)
  • MenghiniG. et al.

    Route choice of cyclists in Zurich

    Transp. Res. A

    (2010)
  • NadiS. et al.

    Multi-criteria, personalized route planning using quantifier-guided ordered weighted averaging operators

    Int. J. Appl. Earth Obs. Geoinf.

    (2011)
  • PahlavaniP. et al.

    Multi-criteria route planning based on a driver’s preferences in multi-criteria route selection

    Transp. Res. C

    (2014)
  • PangY. et al.

    Development of people mass movement simulation framework based on reinforcement learning

    Transp. Res. C

    (2020)
  • TorresM. et al.

    PRoA: an intelligent multi-criteria personalized route assistant

    Eng. Appl. Artif. Intell.

    (2018)
  • YangL. et al.

    Scalable space-time trajectory cube for path-finding: A study using big taxi trajectory data

    Transp. Res. B

    (2017)
  • YouC. et al.

    Advanced planning for autonomous vehicles using reinforcement learning and deep inverse reinforcement learning

    Robot. Auton. Syst.

    (2019)
  • ZhengW. et al.

    Using a heuristic approach to design personalized urban tourism itineraries with hotel selection

    Tour. Manag.

    (2020)
  • ZimmermannM. et al.

    A tutorial on recursive models for analyzing and predicting path choice behavior

    EURO J. Transp. Logist.

    (2020)
  • ZimmermannM. et al.

    Bike route choice modeling using GPS data without choice sets of paths

    Transp. Res. C

    (2017)
  • Aghasadeghi, N., Bretl, T., 2011. Maximum entropy inverse reinforcement learning in continuous state spaces with path...
  • Audiffren, J., Valko, M., Lazaric, A., Ghavamzadeh, M., 2015. Maximum entropy semi-supervised inverse reinforcement...
  • CampigottoP. et al.

    Personalized and situation-aware multimodal route recommendations: the FAVOUR algorithm

    IEEE Trans. Intell. Transp. Syst.

    (2016)
  • ChangJ. et al.

    Local-aggregation graph networks

    IEEE Trans. Pattern Anal. Mach. Intell.

    (2019)
  • Chen, Z., Shen, H.T., Zhou, X., 2011b. Discovering popular routes from trajectories. In: Proceedings of the 27th IEEE...
  • China DailyL.

    Daily online car-hailing orders in China at around 21 million

    (2021)
  • China DailyL.

    Driver accused of role in passenger’s death appeals sentence

    (2021)
  • CuiG. et al.

    Personalized travel route recommendation using collaborative filtering based on GPS trajectories

    Int. J. Digit. Earth

    (2018)
  • Dai, J., Yang, B., Guo, C., Ding, Z., 2015. Personalized route recommendation using big trajectory data. In:...
  • Delling, D., Goldberg, A.V., Goldszmidt, M., Krumm, J., Talwar, K., Werneck, R.F., 2015. Navigation made personal:...
  • DijkstraE.W.

    A note on two problems in connexion with graphs

    Numer. Math.

    (1959)
  • Fang, X., Huang, J., Wang, F., Zeng, L., Liang, H., Wang, H., 2020. ConSTGAT: Contextual spatial-temporal graph...
  • FangM. et al.

    FTPG: A fine-grained traffic prediction method with graph attention network using big trace data

    IEEE Trans. Intell. Transp. Syst.

    (2021)
  • Fernando, T., Denman, S., Sridharan, S., Fookes, C., 2019. Neighbourhood context embeddings in deep inverse...
  • FernandoT. et al.

    Deep inverse reinforcement learning for behavior prediction in autonomous driving: Accurate forecasts of vehicle motion

    IEEE Signal Process. Mag.

    (2020)
  • Cited by (13)

    • AdaBoost-Bagging deep inverse reinforcement learning for autonomous taxi cruising route and speed planning

      2023, Transportation Research Part E: Logistics and Transportation Review
    • Route planning using divide-and-conquer: A GAT enhanced insertion transformer approach

      2023, Transportation Research Part E: Logistics and Transportation Review
    View all citing articles on Scopus
    View full text