Using Hadoop MapReduce for Parallel Genetic Algorithms: A Comparison of the Global, Grid and Island Models,Evolutionary Computation

当前位置： X-MOL 学术 › Evol. Comput. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Using Hadoop MapReduce for Parallel Genetic Algorithms: A Comparison of the Global, Grid and Island Models
Evolutionary Computation ( IF 4.6 ) Pub Date : 2018-12-01 , DOI: 10.1162/evco_a_00213
Filomena Ferrucci ₁ , Pasquale Salza ₁ , Federica Sarro ₂

Affiliation

The need to improve the scalability of Genetic Algorithms (GAs) has motivated the research on Parallel Genetic Algorithms (PGAs), and different technologies and approaches have been used. Hadoop MapReduce represents one of the most mature technologies to develop parallel algorithms. Based on the fact that parallel algorithms introduce communication overhead, the aim of the present work is to understand if, and possibly when, the parallel GAs solutions using Hadoop MapReduce show better performance than sequential versions in terms of execution time. Moreover, we are interested in understanding which PGA model can be most effective among the global, grid, and island models. We empirically assessed the performance of these three parallel models with respect to a sequential GA on a software engineering problem, evaluating the execution time and the achieved speedup. We also analysed the behaviour of the parallel models in relation to the overhead produced by the use of Hadoop MapReduce and the GAs' computational effort, which gives a more machine-independent measure of these algorithms. We exploited three problem instances to differentiate the computation load and three cluster configurations based on 2, 4, and 8 parallel nodes. Moreover, we estimated the costs of the execution of the experimentation on a potential cloud infrastructure, based on the pricing of the major commercial cloud providers. The empirical study revealed that the use of PGA based on the island model outperforms the other parallel models and the sequential GA for all the considered instances and clusters. Using 2, 4, and 8 nodes, the island model achieves an average speedup over the three datasets of 1.8, 3.4, and 7.0 times, respectively. Hadoop MapReduce has a set of different constraints that need to be considered during the design and the implementation of parallel algorithms. The overhead of data store (i.e., HDFS) accesses, communication, and latency requires solutions that reduce data store operations. For this reason, the island model is more suitable for PGAs than the global and grid model, also in terms of costs when executed on a commercial cloud provider.

中文翻译：

将 Hadoop MapReduce 用于并行遗传算法：全局、网格和孤岛模型的比较

提高遗传算法 (GA) 可扩展性的需求激发了对并行遗传算法 (PGA) 的研究，并且已经使用了不同的技术和方法。Hadoop MapReduce 代表了开发并行算法最成熟的技术之一。基于并行算法引入通信开销这一事实，当前工作的目的是了解使用 Hadoop MapReduce 的并行 GA 解决方案是否以及可能何时在执行时间方面表现出比顺序版本更好的性能。此外，我们有兴趣了解在全局、网格和岛屿模型中哪种 PGA 模型最有效。我们根据经验评估了这三个并行模型在软件工程问题上的顺序 GA 的性能，评估执行时间和实现的加速。我们还分析了与使用 Hadoop MapReduce 和 GA 的计算工作产生的开销相关的并行模型的行为，这为这些算法提供了更独立于机器的度量。我们利用三个问题实例来区分计算负载和基于 2、4 和 8 个并行节点的三个集群配置。此外，我们还根据主要商业云提供商的定价估算了在潜在云基础设施上执行实验的成本。实证研究表明，对于所有考虑的实例和集群，基于岛模型的 PGA 的使用优于其他并行模型和顺序 GA。使用 2、4 和 8 个节点，island 模型在三个数据集上分别实现了 1.8、3.4 和 7.0 倍的平均加速。Hadoop MapReduce 有一组不同的约束，需要在并行算法的设计和实现过程中加以考虑。数据存储（即 HDFS）访问、通信和延迟的开销需要减少数据存储操作的解决方案。出于这个原因，孤岛模型比全局和网格模型更适合 PGA，在商业云提供商上执行时的成本也是如此。和延迟需要减少数据存储操作的解决方案。出于这个原因，孤岛模型比全局和网格模型更适合 PGA，在商业云提供商上执行时的成本也是如此。和延迟需要减少数据存储操作的解决方案。出于这个原因，孤岛模型比全局和网格模型更适合 PGA，在商业云提供商上执行时的成本也是如此。

更新日期：2018-12-01

点击分享查看原文

点击收藏

公开下载

阅读更多本刊最新论文本刊介绍/投稿指南11