A mean-field Markov decision process model for spatial-temporal subsidies in ride-sourcing markets,Transportation Research Part B: Methodological

当前位置： X-MOL 学术 › Transp. Res. Part B Methodol. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

A mean-field Markov decision process model for spatial-temporal subsidies in ride-sourcing markets
Transportation Research Part B: Methodological ( IF 5.8 ) Pub Date : 2021-07-16 , DOI: 10.1016/j.trb.2021.06.014
Zheng Zhu ₁ , Jintao Ke ₂ , Hai Wang _{3,

4}

Affiliation

Ride-sourcing services are increasingly popular because of their ability to accommodate on-demand travel needs. A critical issue faced by ride-sourcing platforms is the supply-demand imbalance, as a result of which drivers may spend substantial time on idle cruising and picking up remote passengers. Some platforms attempt to mitigate the imbalance by providing relocation guidance for idle drivers who may have their own self-relocation strategies and decline to follow the suggestions. Platforms then seek to induce drivers to system-desirable locations by offering them subsidies. This paper proposes a mean-field Markov decision process (MF-MDP) model to depict the dynamics in ride-sourcing markets with mixed agents, whereby the platform aims to optimize some objectives from a system perspective using spatial-temporal subsidies with predefined subsidy rates, and a number of drivers aim to maximize their individual income by following certain self-relocation strategies. To solve the model more efficiently, we further develop a representative-agent reinforcement learning algorithm that uses a representative driver to model the decision-making process of multiple drivers. This approach is shown to achieve significant computational advantages, faster convergence, and better performance. Using case studies, we demonstrate that by providing some spatial-temporal subsidies, the platform is able to well balance a short-term objective of maximizing immediate revenue and a long-term objective of maximizing service rate, while drivers can earn higher income.

中文翻译：

拼车市场中时空补贴的均值场马尔可夫决策过程模型

乘车服务越来越受欢迎，因为它们能够满足按需旅行的需求。拼车平台面临的一个关键问题是供需失衡，因此司机可能会花费大量时间在空闲巡航和接送远程乘客上。一些平台试图通过为可能有自己的自我搬迁策略并拒绝遵循建议的闲置司机提供搬迁指导来缓解这种不平衡。然后，平台通过向司机提供补贴来设法将司机引诱到系统理想的地点。本文提出了一个平均场马尔可夫决策过程（MF-MDP）模型来描述混合代理的乘车采购市场的动态，该平台旨在通过使用具有预定义补贴率的时空补贴从系统角度优化某些目标，并且一些驱动程序旨在通过遵循某些自我迁移策略来最大化其个人收入。为了更有效地求解模型，我们进一步开发了一种代表代理强化学习算法，该算法使用具有代表性的驾驶员对多个驾驶员的决策过程进行建模。这种方法显示出显着的计算优势、更快的收敛速度和更好的性能。通过案例研究，我们证明，通过提供一些时空补贴，该平台能够很好地平衡最大化即时收入的短期目标和最大化服务率的长期目标，而司机可以获得更高的收入。许多司机的目标是通过遵循某些自我搬迁策略来最大化他们的个人收入。为了更有效地求解模型，我们进一步开发了一种代表代理强化学习算法，该算法使用具有代表性的驾驶员对多个驾驶员的决策过程进行建模。这种方法显示出显着的计算优势、更快的收敛速度和更好的性能。通过案例研究，我们证明，通过提供一些时空补贴，该平台能够很好地平衡最大化即时收入的短期目标和最大化服务率的长期目标，而司机可以获得更高的收入。许多司机的目标是通过遵循某些自我搬迁策略来最大化他们的个人收入。为了更有效地求解模型，我们进一步开发了一种代表代理强化学习算法，该算法使用具有代表性的驾驶员对多个驾驶员的决策过程进行建模。这种方法显示出显着的计算优势、更快的收敛速度和更好的性能。通过案例研究，我们证明，通过提供一些时空补贴，该平台能够很好地平衡最大化即时收入的短期目标和最大化服务率的长期目标，而司机可以获得更高的收入。我们进一步开发了一种代表性代理强化学习算法，该算法使用代表性驱动程序来模拟多个驱动程序的决策过程。这种方法显示出显着的计算优势、更快的收敛速度和更好的性能。通过案例研究，我们证明，通过提供一些时空补贴，该平台能够很好地平衡最大化即时收入的短期目标和最大化服务率的长期目标，而司机可以获得更高的收入。我们进一步开发了一种代表性代理强化学习算法，该算法使用代表性驱动程序来模拟多个驱动程序的决策过程。这种方法显示出显着的计算优势、更快的收敛速度和更好的性能。通过案例研究，我们证明，通过提供一些时空补贴，该平台能够很好地平衡最大化即时收入的短期目标和最大化服务率的长期目标，而司机可以获得更高的收入。

更新日期：2021-07-16

点击分享查看原文

点击收藏

阅读更多本刊最新论文