Estimating runtime of a job in Hadoop MapReduce,Journal of Big Data

当前位置： X-MOL 学术 › J. Big Data › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Estimating runtime of a job in Hadoop MapReduce
Journal of Big Data ( IF 8.6 ) Pub Date : 2020-07-06 , DOI: 10.1186/s40537-020-00319-4
Narges Peyravi , Ali Moeini

Hadoop MapReduce is a framework to process vast amounts of data in the cluster of machines in a reliable and fault-tolerant manner. Since being aware of the runtime of a job is crucial to subsequent decisions of this platform and being better management, in this paper we propose a new method to estimate the runtime of a job. For this purpose, after analysis the anatomy of processing a job in Hadoop MapReduce precisely, we consider two cases: when a job runs for the first time or a job has run previously. In the first case, by considering essential and efficient parameters that higher impact on runtime we formulate each phase of the Hadoop execution pipeline and state them by mathematical expressions to calculate runtime of a job. In the second case, by referring to the profile or history of a job in the database and use a weighting system the runtime is estimated. The results show the average error rate is less than 12% in the estimation of runtime for the first run and less than 8.5% when the profile or history of the job has existed.

中文翻译：

在Hadoop MapReduce中估算作业的运行时间

Hadoop MapReduce是一个框架，以可靠且容错的方式处理机器集群中的大量数据。由于了解作业的运行时间对于该平台的后续决策和更好的管理至关重要，因此在本文中，我们提出了一种新的方法来估算作业的运行时间。为此，在精确分析了在Hadoop MapReduce中处理作业的解剖结构之后，我们考虑两种情况：第一次运行作业或先前运行过作业。在第一种情况下，通过考虑对运行时间有更大影响的必要且有效的参数，我们制定了Hadoop执行管道的每个阶段，并通过数学表达式进行说明以计算作业的运行时间。在第二种情况下通过参考数据库中作业的配置文件或历史记录并使用加权系统，可以估算运行时间。结果表明，在第一次运行的运行时估计中，平均错误率小于12％，而在存在作业的配置文件或历史记录时，平均错误率小于8.5％。

更新日期：2020-07-06

点击分享查看原文

点击收藏

公开下载

阅读更多本刊最新论文