Learning 2-opt Heuristics for the Traveling Salesman Problem via Deep Reinforcement Learning,arXiv - CS - Artificial Intelligence

当前位置： X-MOL 学术 › arXiv.cs.AI › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Learning 2-opt Heuristics for the Traveling Salesman Problem via Deep Reinforcement Learning
arXiv - CS - Artificial Intelligence Pub Date : 2020-04-03 , DOI: arxiv-2004.01608
Paulo R. de O. da Costa, Jason Rhuggenaath, Yingqian Zhang, Alp Akcay

Recent works using deep learning to solve the Traveling Salesman Problem (TSP) have focused on learning construction heuristics. Such approaches find TSP solutions of good quality but require additional procedures such as beam search and sampling to improve solutions and achieve state-of-the-art performance. However, few studies have focused on improvement heuristics, where a given solution is improved until reaching a near-optimal one. In this work, we propose to learn a local search heuristic based on 2-opt operators via deep reinforcement learning. We propose a policy gradient algorithm to learn a stochastic policy that selects 2-opt operations given a current solution. Moreover, we introduce a policy neural network that leverages a pointing attention mechanism, which unlike previous works, can be easily extended to more general k-opt moves. Our results show that the learned policies can improve even over random initial solutions and approach near-optimal solutions at a faster rate than previous state-of-the-art deep learning methods.

中文翻译：

通过深度强化学习为旅行商问题学习 2-opt 启发式算法

最近使用深度学习解决旅行商问题 (TSP) 的工作侧重于学习构造启发式。此类方法可找到高质量的 TSP 解决方案，但需要额外的程序，例如波束搜索和采样，以改进解决方案并实现最先进的性能。然而，很少有研究关注改进启发式，即改进给定的解决方案直到达到接近最优的解决方案。在这项工作中，我们建议通过深度强化学习学习基于 2-opt 算子的局部搜索启发式。我们提出了一种策略梯度算法来学习随机策略，该策略在给定当前解决方案的情况下选择 2-opt 操作。此外，我们引入了一个利用指向注意机制的策略神经网络，与之前的工作不同，它可以轻松扩展到更一般的 k-opt 移动。

更新日期：2020-09-15

点击分享查看原文

点击收藏

阅读更多本刊最新论文