Personalized route recommendation for ride-hailing with deep inverse reinforcement learning and real-time traffic conditions
Introduction
Ride-hailing (e.g., Didi Chuxing) services play an important role in urban transportation and have gained much attention in China. As reported by China Daily (2021a), Chinese ride-hailing service has boomed in recent years, serving around 21 million trips per day. Behind the popularity of online ride-hailing, many drivers do not follow the route planned by the ride-hailing platform and induce social problems. For example, on February 6, 2021, a woman in Changsha, China found the ride-hailing driver deviated from the planned route. She was afraid and jumped out of the car and died (China Daily, 2021b). Similarly, a woman feared that the driver did not follow the navigation route, ejected herself from the car, and fractured her arm in Hangzhou, China on June 12, 2021 (Global Times, 2021). These incidents illustrate the serious consequences of ride-hailing drivers not following the planned route of online ride-hailing platforms. An important reason for drivers to deviate is that the planned route does not conform to personal route preferences (People’s Daily Online, 2021). Therefore, it is meaningful for ride-hailing platforms to consider drivers’ route preferences when planning routes.
In recent years, there has been a growing interest in recommending routes based on users’ preferences. Literature in this field can be broadly classified into five streams. The first stream directly asks the driver’s route preferences (Nadi and Delavar, 2011, Pahlavani and Delavar, 2014, Zeng et al., 2015, Campigotto et al., 2016, Torres et al., 2018, Li et al., 2022). However, it is hard for drivers to describe their preferences precisely. The second stream builds an experience-based graph from historical trajectories to find the route with the lowest cost (Dai et al., 2015, He et al., 2017, Yang et al., 2017, Neto et al., 2018). Although it selects similar historical trajectories considering the user’s preference, it cannot analyze the user’s preference for each road segment. The third stream employs discrete choice models to learn drivers’ preferences (Menghini et al., 2010, Broach et al., 2012, Fosgerau et al., 2013, Mai et al., 2015, Zimmermann et al., 2017, Lu et al., 2017). The fourth stream generates the transition probability matrix using historical trajectories and recommends the route with the highest probability (Chen et al., 2011a, Chen et al., 2011b, Cui et al., 2018). However, it ignores the sequence of travel behavior (Cui et al., 2018). The fifth stream utilizes deep learning (Delling et al., 2015, Wang et al., 2019, Li et al., 2020), especially the IRL (Ziebart et al., 2008, Wulfmeier et al., 2015, Liu et al., 2020) for personalized route recommendation. The IRL is the problem of learning the reward function of an agent underlying a Markov decision process from its observed behavior (Arora and Doshi, 2021). We notice that existing IRL studies assume the traffic condition is static and ignore the impact of real-time traffic conditions when planning personalized routes (Wulfmeier et al., 2015, Nguyen et al., 2015, Wulfmeier et al., 2016, Wulfmeier et al., 2017, Liu et al., 2020). In addition, it is difficult to accurately compute the expected state visitation frequency when the state space is large (Audiffren et al., 2015, Oh and Iyengar, 2019, Imani and Ghoreishi, 2021).
To address these issues, we extend the deep IRL to the dynamic traffic environment and propose a method for computing expected state visitation frequency. The input of our model includes the road segment encoding and real-time speed, and the output is the real-time reward. The reward for passing through each road segment is not fixed, but varies with the real-time traffic conditions at the departure time. The training framework of our model consists of three main phases: (1) estimate the underlying reward function; (2) approximate the expected state visitation frequencies based on the current reward function, historical routes, and traffic speed information; (3) compute the gradient and update the neural network. These three phases are executed sequentially in each iteration. After recovering the reward function through training, we recommend personal routes based on the reward function and real-time traffic conditions at the departure time. Furthermore, we verify whether the models trained with occupied and empty trajectories differ significantly in personalized route recommendation.
The contributions of this research are summarized as follows:
- •
We extend the inverse reinforcement learning to combine the real-time traffic condition at the departure time with personalized route preference for route recommendation;
- •
This study proposes a method to calculate the expected state visitation frequency more accurately based on characteristics of ride-hailing and taxi trajectories;
- •
Numerical experiments conducted on real ride-hailing trajectories show that our model outperforms state-of-the-art IRL studies. We also find that route preferences reflected by the same driver’s empty trajectories and occupied trajectories are significantly different.
The remainder of this paper is organized as follows. In Section 2, we present the related work on personalized route recommendation, inverse reinforcement learning, graph neural network, and personalized itinerary recommendation. In Section 3, we introduce our improved deep inverse reinforcement learning model in detail. In Section 4, we conduct experiments to verify our model using the real ride-hailing trajectories in Chengdu, China and analyze the difference in route preference reflected by empty trajectories and occupied trajectories. Finally, we present a conclusion and discuss future research directions in Section 5.
Section snippets
Personalized route recommendation
Studies on personalized route recommendation can be broadly divided into five approaches. The first approach acquires the driver’s route preferences by communicating with drivers and recommends the route accordingly. Nadi and Delavar (2011) ask drivers to define the relative importance of the attributes (e.g., travel distance, road width) by integrating a pairwise comparison method and ordered weighted averaging operators. Pahlavani and Delavar (2014) receive the driver’s route preferences such
Preliminaries
Definition 1 Road Network A road network is modeled as a directed graph , where and represent the set of nodes and edges, respectively. A node represents a crossroad or a road start/end. An edge represents a directed road segment with a specific speed, and its weight is the opposite number of its reward. The adjacency between road segments is defined in terms of spatial location.
Definition 2 GPS Trajectory A GPS trajectory is a sequence of time-ordered GPS points generated by a vehicle, i.e., , where is
Experimental setup
To evaluate the performance of our proposed method in personalized route recommendation of ride-hailing and taxis, we use the real-world trajectory dataset provided by Didi Chuxing GAIA Initiative (https://gaia.didichuxing.com). This dataset includes over 14.55 million ride-hailing trajectories in Chengdu, China, and spans two months from October 1st to November 30th, 2018. The study area is in the central district of Chengdu, China, which ranges from to and from to
Conclusions and future research directions
This study aims to improve inverse reinforcement learning for personalized route recommendation of ride-hailing considering the impact of real-time traffic conditions. In our model, we assume that the driver’s reward for passing through each road segment is not only related to the driver’s route preference, but also related to the real-time traffic conditions. Our proposed model takes road segment encoding and real-time speed as inputs, and leverages deep neural networks to capture complex
CRediT authorship contribution statement
Shan Liu: Conceptualization, Methodology, Investigation, Formal analysis, Writing – original draft, Software, Visualization, Writing – review & editing. Hai Jiang: Supervision, Project administration, Writing – review & editing, Funding acquisition.
Acknowledgments
This research is supported by the National Natural Science Foundation of China [Grant 71761137003]. We would like to thank Didi Chuxing GAIA Initiative for providing the ride-hailing trajectory data and traffic speed data.
References (74)
- et al.
Modeling pedestrian-cyclist interactions in shared space using inverse reinforcement learning
Transp. Res. F
(2020) - et al.
A survey of inverse reinforcement learning: Challenges, methods and progress
Artificial Intelligence
(2021) - et al.
Where do cyclists ride? A route choice model developed with revealed preference GPS data
Transp. Res. A
(2012) - et al.
A personal route prediction system based on trajectory data mining
Inform. Sci.
(2011) - et al.
Personalized itinerary recommendation: Deep and collaborative learning with textual information
Expert Syst. Appl.
(2020) - et al.
TrajGAIL: Generating urban vehicle trajectories using generative adversarial imitation learning
Transp. Res. C
(2021) - et al.
A link based network route choice model with unrestricted choice set
Transp. Res. B
(2013) - et al.
Generating pedestrian walking behavior considering detour and pause in the path under space-time constraints
Transp. Res. C
(2019) - et al.
Personalized multi-period tour recommendations
Tour. Manag.
(2017) - et al.
Using a heuristic algorithm to design a personalized day tour route in a time-dependent stochastic environment
Tour. Manag.
(2018)
Integrating Dijkstra’s algorithm into deep inverse reinforcement learning for food delivery route planning
Transp. Res. E
A nested recursive logit model for route choice analysis
Transp. Res. B
Route choice of cyclists in Zurich
Transp. Res. A
Multi-criteria, personalized route planning using quantifier-guided ordered weighted averaging operators
Int. J. Appl. Earth Obs. Geoinf.
Multi-criteria route planning based on a driver’s preferences in multi-criteria route selection
Transp. Res. C
Development of people mass movement simulation framework based on reinforcement learning
Transp. Res. C
PRoA: an intelligent multi-criteria personalized route assistant
Eng. Appl. Artif. Intell.
Scalable space-time trajectory cube for path-finding: A study using big taxi trajectory data
Transp. Res. B
Advanced planning for autonomous vehicles using reinforcement learning and deep inverse reinforcement learning
Robot. Auton. Syst.
Using a heuristic approach to design personalized urban tourism itineraries with hotel selection
Tour. Manag.
A tutorial on recursive models for analyzing and predicting path choice behavior
EURO J. Transp. Logist.
Bike route choice modeling using GPS data without choice sets of paths
Transp. Res. C
Personalized and situation-aware multimodal route recommendations: the FAVOUR algorithm
IEEE Trans. Intell. Transp. Syst.
Local-aggregation graph networks
IEEE Trans. Pattern Anal. Mach. Intell.
Daily online car-hailing orders in China at around 21 million
Driver accused of role in passenger’s death appeals sentence
Personalized travel route recommendation using collaborative filtering based on GPS trajectories
Int. J. Digit. Earth
A note on two problems in connexion with graphs
Numer. Math.
FTPG: A fine-grained traffic prediction method with graph attention network using big trace data
IEEE Trans. Intell. Transp. Syst.
Deep inverse reinforcement learning for behavior prediction in autonomous driving: Accurate forecasts of vehicle motion
IEEE Signal Process. Mag.
Cited by (13)
How do active road users act around autonomous vehicles? An inverse reinforcement learning approach
2024, Transportation Research Part C: Emerging TechnologiesTwo-stage travel itinerary recommendation optimization model considering stochastic traffic time
2024, Expert Systems with ApplicationsAnomalous ride-hailing driver detection with deep transfer inverse reinforcement learning
2024, Transportation Research Part C: Emerging TechnologiesAre ride-hailing services safer than taxis? A multivariate spatial approach with accommodation of exposure uncertainty
2023, Accident Analysis and PreventionAdaBoost-Bagging deep inverse reinforcement learning for autonomous taxi cruising route and speed planning
2023, Transportation Research Part E: Logistics and Transportation ReviewRoute planning using divide-and-conquer: A GAT enhanced insertion transformer approach
2023, Transportation Research Part E: Logistics and Transportation Review