Optimizing matching time intervals for ride-hailing services using reinforcement learning,Transportation Research Part C: Emerging Technologies

当前位置： X-MOL 学术 › Transp. Res. Part C Emerg. Technol. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Optimizing matching time intervals for ride-hailing services using reinforcement learning
Transportation Research Part C: Emerging Technologies ( IF 7.6 ) Pub Date : 2021-06-17 , DOI: 10.1016/j.trc.2021.103239
Guoyang Qin , Qi Luo , Yafeng Yin , Jian Sun , Jieping Ye

Matching trip requests and available drivers efficiently is considered a central operational problem for ride-hailing services. A widely adopted matching strategy is to accumulate a batch of potential passenger-driver matches and solve bipartite matching problems repeatedly. The efficiency of matching can be improved substantially if the matching is delayed by adaptively adjusting the matching time interval. The optimal delayed matching is subject to the trade-off between the delay penalty and the reduced wait cost and is dependent on the system’s supply and demand states. Searching for the optimal delayed matching policy is challenging, as the current policy is compounded with past actions. To this end, we tailor a family of reinforcement learning-based methods to overcome the curse of dimensionality and sparse reward issues. In addition, this work provides a solution to spatial partitioning balance between the state representation error and the optimality gap of asynchronous matching. Lastly, we examine the proposed methods with real-world taxi trajectory data and garner managerial insights into the general delayed matching policies. The focus of this work is single-ride service due to limited access to shared ride data, while the general framework can be extended to the setting with a ride-pooling component.

中文翻译：

使用强化学习优化叫车服务的匹配时间间隔

有效匹配行程请求和可用司机被认为是乘车服务的核心运营问题。一种广泛采用的匹配策略是积累一批潜在的乘客-司机匹配并重复解决二分匹配问题。如果通过自适应调整匹配时间间隔来延迟匹配，可以显着提高匹配效率。最佳延迟匹配取决于延迟惩罚和减少的等待成本之间的权衡，并取决于系统的供需状态。搜索最佳延迟匹配策略具有挑战性，因为当前的策略与过去的行为相结合。为此，我们定制了一系列基于强化学习的方法，以克服维度诅咒和稀疏奖励问题。此外，这项工作为状态表示误差和异步匹配的最优性差距之间的空间划分平衡提供了解决方案。最后，我们用真实世界的出租车轨迹数据检查了所提出的方法，并获得了对一般延迟匹配政策的管理见解。由于对共享乘车数据的访问有限，这项工作的重点是单程服务，而通用框架可以扩展到具有拼车组件的设置。

更新日期：2021-06-17

点击分享查看原文

点击收藏

阅读更多本刊最新论文