当前位置:
X-MOL 学术
›
arXiv.cs.SI
›
论文详情
Our official English website, www.x-mol.net, welcomes your
feedback! (Note: you will need to create a separate account there.)
Multi-Objective Vehicle Rebalancing for Ridehailing System using a Reinforcement Learning Approach
arXiv - CS - Social and Information Networks Pub Date : 2020-07-14 , DOI: arxiv-2007.06801 Yuntian Deng, Hao Chen, Shiping Shao, Jiacheng Tang, Jianzong Pi, Abhishek Gupta
arXiv - CS - Social and Information Networks Pub Date : 2020-07-14 , DOI: arxiv-2007.06801 Yuntian Deng, Hao Chen, Shiping Shao, Jiacheng Tang, Jianzong Pi, Abhishek Gupta
The problem of designing a rebalancing algorithm for a large-scale
ridehailing system with asymmetric demand is considered here. We pose the
rebalancing problem within a semi Markov decision problem (SMDP) framework with
closed queues of vehicles serving stationary, but asymmetric demand, over a
large city with multiple nodes (representing neighborhoods). We assume that the
passengers queue up at every node until they are matched with a vehicle. The
goal of the SMDP is to minimize a convex combination of the waiting time of the
passengers and the total empty vehicle miles traveled. The resulting SMDP
appears to be difficult to solve for closed-form expression for the rebalancing
strategy. As a result, we use a deep reinforcement learning algorithm to
determine the approximately optimal solution to the SMDP. The trained policy is
compared with other well-known algorithms for rebalancing, which are designed
to address other objectives (such as to minimize demand drop probability) for
the ridehailing problem.
中文翻译:
使用强化学习方法的叫车系统多目标车辆再平衡
这里考虑了为具有非对称需求的大规模乘车系统设计重新平衡算法的问题。我们在半马尔可夫决策问题 (SMDP) 框架内提出再平衡问题,其中封闭的车辆队列服务于具有多个节点(代表社区)的大城市,服务于静止但不对称的需求。我们假设乘客在每个节点排队,直到他们与车辆匹配。SMDP 的目标是最小化乘客等待时间和行驶的空车总里程的凸组合。由此产生的 SMDP 似乎难以解决重新平衡策略的封闭形式表达式。因此,我们使用深度强化学习算法来确定 SMDP 的近似最优解。
更新日期:2020-07-15
中文翻译:
使用强化学习方法的叫车系统多目标车辆再平衡
这里考虑了为具有非对称需求的大规模乘车系统设计重新平衡算法的问题。我们在半马尔可夫决策问题 (SMDP) 框架内提出再平衡问题,其中封闭的车辆队列服务于具有多个节点(代表社区)的大城市,服务于静止但不对称的需求。我们假设乘客在每个节点排队,直到他们与车辆匹配。SMDP 的目标是最小化乘客等待时间和行驶的空车总里程的凸组合。由此产生的 SMDP 似乎难以解决重新平衡策略的封闭形式表达式。因此,我们使用深度强化学习算法来确定 SMDP 的近似最优解。