当前位置: X-MOL 学术Transp. Res. Part C Emerg. Technol. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Multi-vehicle routing problems with soft time windows: A multi-agent reinforcement learning approach
Transportation Research Part C: Emerging Technologies ( IF 8.3 ) Pub Date : 2020-11-25 , DOI: 10.1016/j.trc.2020.102861
Ke Zhang , Fang He , Zhengchao Zhang , Xi Lin , Meng Li

Multi-vehicle routing problem with soft time windows (MVRPSTW) is an indispensable constituent in urban logistics distribution systems. Over the past decade, numerous methods for MVRPSTW have been proposed, but most are based on heuristic rules that require a large amount of computation time. With the current rapid increase of logistics demands, traditional methods incur the dilemma between computational efficiency and solution quality. To efficiently solve the problem, we propose a novel reinforcement learning algorithm called the Multi-Agent Attention Model that can solve routing problem instantly benefit from lengthy offline training. Specifically, the vehicle routing problem is regarded as a vehicle tour generation process, and an encoder-decoder framework with attention layers is proposed to generate tours of multiple vehicles iteratively. Furthermore, a multi-agent reinforcement learning method with an unsupervised auxiliary network is developed for the model training. By evaluated on four synthetic networks with different scales, the results demonstrate that the proposed method consistently outperforms Google OR-Tools and traditional methods with little computation time. In addition, we validate the robustness of the well-trained model by varying the number of customers and the capacities of vehicles.



中文翻译:

带有软时间窗的多车辆路由问题:一种多智能体强化学习方法

具有软时间窗(MVRPSTW)的多车辆路由问题是城市物流配送系统中必不可少的组成部分。在过去的十年中,已经提出了许多用于MVRPSTW的方法,但是大多数方法是基于启发式规则的,该启发式规则需要大量的计算时间。随着当前物流需求的快速增长,传统方法引起了计算效率和解决方案质量之间的困境。为了有效解决该问题,我们提出了一种新颖的强化学习算法,称为Multi-Agent Attention模型,该算法可以通过长时间的离线训练立即解决路由问题。具体来说,将车辆路线问题视为车辆行程生成过程,并提出了一种具有关注层的编码器-解码器框架来迭代生成多个车辆的行程。此外,针对模型训练,开发了一种具有无监督辅助网络的多主体强化学习方法。通过在四个不同规模的合成网络上进行评估,结果表明,该方法始终优于Google OR-Tools和传统方法,且计算时间很少。此外,我们通过改变客户数量和车辆容量来验证训练有素的模型的鲁棒性。

更新日期:2020-11-26
down
wechat
bug