当前位置: X-MOL 学术Transp. Res. Part B Methodol. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Deep Reinforcement Learning for Crowdsourced Urban Delivery
Transportation Research Part B: Methodological ( IF 6.8 ) Pub Date : 2021-09-13 , DOI: 10.1016/j.trb.2021.08.015
Tanvir Ahamed 1 , Bo Zou 1 , Nahid Parvez Farazi 1 , Theja Tulabandhula 2
Affiliation  

This paper investigates the problem of assigning shipping requests to ad hoc couriers in the context of crowdsourced urban delivery. The shipping requests are spatially distributed each with a limited time window between the earliest time for pickup and latest time for delivery. The ad hoc couriers, termed crowdsourcees, also have limited time availability and carrying capacity. We propose a new deep reinforcement learning (DRL)-based approach to tackling this assignment problem. A deep Q network (DQN) algorithm is trained which entails two salient features of experience replay and target network that enhance the efficiency, convergence, and stability of DRL training. More importantly, this paper makes three methodological contributions: 1) presenting a comprehensive and novel characterization of crowdshipping system states that encompasses spatial-temporal and capacity information of crowdsourcees and requests; 2) embedding heuristics that leverage information offered by the state representation and are based on intuitive reasonings to guide specific actions to take, to preserve tractability and enhance efficiency of training; and 3) integrating rule-interposing to prevent repeated visiting of the same routes and node sequences during routing improvement, thereby further enhancing the training efficiency by accelerating learning. The computational complexities of the heuristics and the overall DQN training are investigated. The effectiveness of the proposed approach is demonstrated through extensive numerical analysis. The results show the benefits brought by the heuristics-guided action choice, rule-interposing, and having time-related information in the state space in DRL training, the near-optimality of the solutions obtained, and the superiority of the proposed approach over existing methods in terms of solution quality, computation time, and scalability.



中文翻译:

众包城市交付的深度强化学习

本文研究了在众包城市交付的背景下将运输请求分配给临时快递员的问题。运输请求在空间上分布,每个请求在最早取货时间和最晚交货时间之间都有一个有限的时间窗口。该专案被称为众包的信使的时间可用性和承载能力也有限。我们提出了一种新的基于深度强化学习 (DRL) 的方法来解决这个分配问题。训练深度 Q 网络 (DQN) 算法,该算法具有经验回放和目标网络两个显着特征,可提高 DRL 训练的效率、收敛性和稳定性。更重要的是,本文做出了三个方法论贡献:1)展示了众包系统状态的全面而新颖的表征,其中包含众包和请求的时空和容量信息;2)嵌入启发式,利用状态表示提供的信息,并基于直觉推理来指导采取的具体行动,以保持易处理性并提高训练效率;3) 集成规则插入,防止路由改进过程中重复访问相同的路由和节点序列,从而通过加速学习进一步提高训练效率。研究了启发式和整体 DQN 训练的计算复杂性。通过广泛的数值分析证明了所提出方法的有效性。结果显示了启发式引导的动作选择、规则插入和在 DRL 训练中的状态空间中具有时间相关信息所带来的好处,所获得的解决方案的近乎最优,以及所提出的方法优于现有方法解决方案质量、计算时间和可扩展性方面的方法。

更新日期:2021-09-14
down
wechat
bug