Designing a MapReduce performance model in distributed heterogeneous platforms based on benchmarking approach,The Journal of Supercomputing

当前位置： X-MOL 学术 › J. Supercomput. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Designing a MapReduce performance model in distributed heterogeneous platforms based on benchmarking approach
The Journal of Supercomputing ( IF 3.3 ) Pub Date : 2020-01-16 , DOI: 10.1007/s11227-020-03162-9
Abolfazl Gandomi , Ali Movaghar , Midia Reshadi , Ahmad Khademzadeh

MapReduce framework is an effective method for big data parallel processing. Enhancing the performance of MapReduce clusters, along with reducing their job execution time, is a fundamental challenge to this approach. In fact, one is faced with two challenges here: how to maximize the execution overlap between jobs and how to create an optimum job scheduling. Accordingly, one of the most critical challenges to achieving these goals is developing a precise model to estimate the job execution time due to the large number and high volume of the submitted jobs, limited consumable resources, and the need for proper Hadoop configuration. This paper presents a model based on MapReduce phases for predicting the execution time of jobs in a heterogeneous cluster. Moreover, a novel heuristic method is designed, which significantly reduces the makespan of the jobs. In this method, first by providing the job profiling tool, we obtain the execution details of the MapReduce phases through log analysis. Then, using machine learning methods and statistical analysis, we propose a relevant model to predict runtime. Finally, another tool called job submission and monitoring tool is used for calculating makespan. Different experiments were conducted on the benchmarks under identical conditions for all jobs. The results show that the average makespan speedup for the proposed method was higher than an unoptimized case.

中文翻译：

基于基准测试的分布式异构平台MapReduce性能模型设计

MapReduce 框架是一种有效的大数据并行处理方法。提高 MapReduce 集群的性能，同时减少它们的作业执行时间，是这种方法的一个基本挑战。实际上，这里面临两个挑战：如何最大化作业之间的执行重叠以及如何创建最佳作业调度。因此，实现这些目标的最关键挑战之一是开发一个精确的模型来估计作业执行时间，因为提交的作业数量众多，消耗资源有限，并且需要适当的 Hadoop 配置。本文提出了一种基于 MapReduce 阶段的模型，用于预测异构集群中作业的执行时间。此外，设计了一种新颖的启发式方法，这大大减少了工作的完成时间。在该方法中，首先通过提供job profiling工具，通过日志分析获取MapReduce各阶段的执行细节。然后，使用机器学习方法和统计分析，我们提出了一个相关模型来预测运行时间。最后，另一个称为作业提交和监控工具的工具用于计算 makespan。在所有作业的相同条件下，对基准进行了不同的实验。结果表明，所提出方法的平均装配加速比高于未优化的情况。最后，另一个称为作业提交和监控工具的工具用于计算 makespan。在所有作业的相同条件下，对基准进行了不同的实验。结果表明，所提出方法的平均装配加速比高于未优化的情况。最后，另一个称为作业提交和监控工具的工具用于计算 makespan。在所有作业的相同条件下，对基准进行了不同的实验。结果表明，所提出方法的平均装配加速比高于未优化的情况。

更新日期：2020-01-16

点击分享查看原文

点击收藏

阅读更多本刊最新论文

全部期刊列表>>