Dispatch of autonomous vehicles for taxi services: A deep reinforcement learning approach,Transportation Research Part C: Emerging Technologies

当前位置： X-MOL 学术 › Transp. Res. Part C Emerg. Technol. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Dispatch of autonomous vehicles for taxi services: A deep reinforcement learning approach
Transportation Research Part C: Emerging Technologies ( IF 7.6 ) Pub Date : 2020-04-10 , DOI: 10.1016/j.trc.2020.102626
Chao Mao , Yulin Liu , Zuo-Jun (Max) Shen

In this paper, we define and investigate a novel model-free deep reinforcement learning framework to solve the taxi dispatch problem. The framework can be used to redistribute vehicles when the travel demand and taxi supply is either spatially or temporally imbalanced in a transportation network. While previous works mostly focus on using model-based methods, the goal of this paper is to explore the policy-based deep reinforcement learning algorithm as a model-free method to optimize the rebalancing strategy. In particular, we propose an actor-critic algorithm with feed-forward neural networks as approximations of both policy and value functions, where the policy function provides the optimal dispatch strategy and the value function estimates the expected costs at each time stamp. Our numerical studies show that the algorithm converges to the theoretical upper bound with less than 4% optimality gap, whether the system dynamics are deterministic or stochastic. We also investigate the scenario where we consider user priority and fairness, and the results indicate that our learned policy is capable of producing a superior strategy that balances equity, cancellation, and level of service when user priority is considered.

中文翻译：

为出租车服务派遣自动驾驶汽车：一种深度强化学习方法

在本文中，我们定义并研究了一种新颖的无模型深度强化学习框架，以解决出租车调度问题。当旅行需求和出租车供应在运输网络中在空间或时间上不平衡时，该框架可用于重新分配车辆。尽管先前的工作主要集中在使用基于模型的方法，但本文的目的是探索基于策略的深度强化学习算法，作为一种无模型的方法来优化再平衡策略。特别是，我们提出了一种前馈神经网络的行为者批判算法，作为策略和价值函数的近似值，其中策略函数提供了最佳的调度策略，而价值函数则估计了每个时间戳的预期成本。我们的数值研究表明，无论系统动力学是确定性的还是随机的，该算法都收敛于理论上限，且最佳间隙小于4％。我们还研究了考虑用户优先级和公平性的情况，结果表明，当考虑到用户优先级时，我们所学到的策略能够制定出一种平衡权益，取消和服务水平的卓越策略。

更新日期：2020-04-12

点击分享查看原文

点击收藏

阅读更多本刊最新论文