当前位置: X-MOL 学术Knowl. Based Syst. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Heuristic algorithms based on deep reinforcement learning for quadratic unconstrained binary optimization
Knowledge-Based Systems ( IF 8.8 ) Pub Date : 2020-08-08 , DOI: 10.1016/j.knosys.2020.106366
Ming Chen , Yuning Chen , Yonghao Du , Luona Wei , Yingwu Chen

The unconstrained binary quadratic programming (UBQP) problem is a difficult combinatorial optimization problem that has been intensively studied in the past decades. Due to its NP-hardness, many heuristic algorithms have been developed for the solution of the UBQP. These algorithms are usually problem-tailored, which lack generality and scalability. To address these issues, a heuristic algorithm based on deep reinforcement learning (DRLH) is proposed in this paper. It features in inputting specific features and using a neural network model called NN to guild the selection of variable at each solution construction step. Also, to improve the algorithm speed and efficiency, two algorithm variants named simplified DRLH (DRLS) and DRLS with hill climbing (DRLS-HC) are developed as well. These three algorithms are examined through extensive experiments in comparison with famous heuristic algorithms from the literature. Experimental results show that the DRLH, DRLS, and DRLS-HC outperform their competitors in terms of both solution quality and computational efficiency. Precisely, the DRLH achieves the best-quality results, while DRLS offers a high-quality solution in a very short time. By adding a hill-climbing procedure to DRLS, the resulting DRLS-HC algorithm is able to obtain almost the same quality result as DRLH with however 5 times less computing time on average. We conducted additional experiments on large-scale instances and various data distributions to verify the generality and scalability of the proposed algorithms, and the results on benchmark instances indicate the ability of the algorithms to be applied to practical problems.



中文翻译:

基于深度强化学习的启发式算法用于二次无约束二进制优化

无约束二进制二次规划(UBQP)问题是一个困难的组合优化问题,在过去的几十年中已经进行了深入研究。由于其NP硬度,已为UBQP解决方案开发了许多启发式算法。这些算法通常是针对问题量身定做的,缺乏通用性和可伸缩性。为了解决这些问题,本文提出了一种基于深度强化学习的启发式算法。它的特征在于输入特定特征,并在每个解决方案构建步骤中使用称为NN的神经网络模型指导变量的选择。另外,为了提高算法速度和效率,还开发了两种算法变体,分别称为简化DRLH(DRLS)和带爬坡的DRLS(DRLS-HC)。与文献中著名的启发式算法相比,通过广泛的实验对这三种算法进行了检验。实验结果表明,DRLH,DRLS和DRLS-HC在解决方案质量和计算效率方面均优于竞争对手。准确地说,DRLH可以达到最佳质量的结果,而DRLS可以在很短的时间内提供高质量的解决方案。通过向DRLS添加爬山程序,所得的DRLS-HC算法能够获得与DRLH几乎相同的质量结果,但平均计算时间却减少了5倍。我们在大型实例和各种数据分布上进行了额外的实验,以验证所提出算法的通用性和可扩展性,

更新日期:2020-08-23
down
wechat
bug