当前位置:
X-MOL 学术
›
arXiv.cs.DC
›
论文详情
Our official English website, www.x-mol.net, welcomes your
feedback! (Note: you will need to create a separate account there.)
Implementing a GPU-based parallel MAX-MIN Ant System
arXiv - CS - Distributed, Parallel, and Cluster Computing Pub Date : 2020-01-18 , DOI: arxiv-2003.11902 Rafa{\l} Skinderowicz
arXiv - CS - Distributed, Parallel, and Cluster Computing Pub Date : 2020-01-18 , DOI: arxiv-2003.11902 Rafa{\l} Skinderowicz
The MAX-MIN Ant System (MMAS) is one of the best-known Ant Colony
Optimization (ACO) algorithms proven to be efficient at finding satisfactory
solutions to many difficult combinatorial optimization problems. The slow-down
in Moore's law, and the availability of graphics processing units (GPUs)
capable of conducting general-purpose computations at high speed, has sparked
considerable research efforts into the development of GPU-based ACO
implementations. In this paper, we discuss a range of novel ideas for improving
the GPU-based parallel MMAS implementation, allowing it to better utilize the
computing power offered by two subsequent Nvidia GPU architectures.
Specifically, based on the weighted reservoir sampling algorithm we propose a
novel parallel implementation of the node selection procedure, which is at the
heart of the MMAS and other ACO algorithms. We also present a memory-efficient
implementation of another key-component -- the tabu list structure -- which is
used in the ACO's solution construction stage. The proposed implementations,
combined with the existing approaches, lead to a total of six MMAS variants,
which are evaluated on a set of Traveling Salesman Problem (TSP) instances
ranging from 198 to 3,795 cities. The results show that our MMAS implementation
is competitive with state-of-the-art GPU-based and multi-core CPU-based
parallel ACO implementations: in fact, the times obtained for the Nvidia V100
Volta GPU were up to 7.18x and 21.79x smaller, respectively. The fastest of the
proposed MMAS variants is able to generate over 1 million candidate solutions
per second when solving a 1,002-city instance. Moreover, we show that, combined
with the 2-opt local search heuristic, the proposed parallel MMAS finds
high-quality solutions for the TSP instances with up to 18,512 nodes.
中文翻译:
实现基于 GPU 的并行 MAX-MIN Ant 系统
MAX-MIN Ant System (MMAS) 是最著名的蚁群优化 (ACO) 算法之一,被证明可以有效地为许多困难的组合优化问题找到令人满意的解决方案。摩尔定律的放缓,以及能够高速进行通用计算的图形处理单元 (GPU) 的可用性,已经激发了对基于 GPU 的 ACO 实现开发的大量研究工作。在本文中,我们讨论了一系列改进基于 GPU 的并行 MMAS 实现的新想法,使其能够更好地利用两个后续 Nvidia GPU 架构提供的计算能力。具体来说,基于加权水库采样算法,我们提出了一种新的节点选择过程的并行实现,这是 MMAS 和其他 ACO 算法的核心。我们还介绍了另一个关键组件——禁忌列表结构——的内存高效实现,它用于 ACO 的解决方案构建阶段。提议的实现与现有方法相结合,产生了总共六个 MMAS 变体,这些变体在一组旅行商问题 (TSP) 实例上进行了评估,范围从 198 到 3,795 个城市。结果表明,我们的 MMAS 实现与最先进的基于 GPU 和基于多核 CPU 的并行 ACO 实现具有竞争力:事实上,Nvidia V100 Volta GPU 获得的时间分别高达 7.18 倍和 21.79 倍x 分别较小。在解决 1,002 个城市实例时,最快的 MMAS 变体能够每秒生成超过 100 万个候选解决方案。而且,
更新日期:2020-03-27
中文翻译:
实现基于 GPU 的并行 MAX-MIN Ant 系统
MAX-MIN Ant System (MMAS) 是最著名的蚁群优化 (ACO) 算法之一,被证明可以有效地为许多困难的组合优化问题找到令人满意的解决方案。摩尔定律的放缓,以及能够高速进行通用计算的图形处理单元 (GPU) 的可用性,已经激发了对基于 GPU 的 ACO 实现开发的大量研究工作。在本文中,我们讨论了一系列改进基于 GPU 的并行 MMAS 实现的新想法,使其能够更好地利用两个后续 Nvidia GPU 架构提供的计算能力。具体来说,基于加权水库采样算法,我们提出了一种新的节点选择过程的并行实现,这是 MMAS 和其他 ACO 算法的核心。我们还介绍了另一个关键组件——禁忌列表结构——的内存高效实现,它用于 ACO 的解决方案构建阶段。提议的实现与现有方法相结合,产生了总共六个 MMAS 变体,这些变体在一组旅行商问题 (TSP) 实例上进行了评估,范围从 198 到 3,795 个城市。结果表明,我们的 MMAS 实现与最先进的基于 GPU 和基于多核 CPU 的并行 ACO 实现具有竞争力:事实上,Nvidia V100 Volta GPU 获得的时间分别高达 7.18 倍和 21.79 倍x 分别较小。在解决 1,002 个城市实例时,最快的 MMAS 变体能够每秒生成超过 100 万个候选解决方案。而且,