当前位置: X-MOL 学术Big Data › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
How Heterogeneity Affects the Design of Hadoop MapReduce Schedulers: A State-of-the-Art Survey and Challenges.
Big Data ( IF 4.6 ) Pub Date : 2018-06-01 , DOI: 10.1089/big.2018.0013
Vaibhav Pandey 1 , Poonam Saini 1
Affiliation  

MapReduce (MR) computing paradigm and its open source implementation Hadoop have become a de facto standard to process big data in a distributed environment. Initially, the Hadoop system was homogeneous in three significant aspects, namely, user, workload, and cluster (hardware). However, with growing variety of MR jobs and inclusion of different configurations of nodes in the existing cluster, heterogeneity has become an essential part of Hadoop systems. The heterogeneity factors adversely affect the performance of a Hadoop scheduler and limit the overall throughput of the system. To overcome this problem, various heterogeneous Hadoop schedulers have been proposed in the literature. Existing survey works in this area mostly cover homogeneous schedulers and classify them on the basis of quality of service parameters they optimize. Hence, there is a need to study the heterogeneous Hadoop schedulers on the basis of various heterogeneity factors considered by them. In this survey article, we first discuss different heterogeneity factors that typically exist in a Hadoop system and then explore various challenges that arise while designing the schedulers in the presence of such heterogeneity. Afterward, we present the comparative study of heterogeneous scheduling algorithms available in the literature and classify them by the previously said heterogeneity factors. Lastly, we investigate different methods and environment used for evaluation of discussed Hadoop schedulers.

中文翻译:

异构性如何影响Hadoop MapReduce Scheduler的设计:最新的调查和挑战。

MapReduce(MR)计算范例及其开源实现Hadoop已成为在分布式环境中处理大数据的事实上的标准。最初,Hadoop系统在三个重要方面是同质的,即用户,工作负载和集群(硬件)。但是,随着MR作业的多样性不断增加以及现有集群中节点配置的不同,异构性已成为Hadoop系统的重要组成部分。异构性因素会对Hadoop调度程序的性能产生不利影响,并限制系统的整体吞吐量。为了克服这个问题,文献中已经提出了各种异构的Hadoop调度器。该领域中的现有调查工作大多涵盖同类的调度程序,并根据它们优化的服务质量参数对其进行分类。因此,有必要根据他们考虑的各种异构因素来研究异构Hadoop调度程序。在这篇调查文章中,我们首先讨论Ha​​doop系统中通常存在的不同异构性因素,然后探讨在存在这种异构性的情况下设计调度程序时出现的各种挑战。之后,我们介绍了文献中可用的异构调度算法的比较研究,并根据先前所说的异构性因素对它们进行了分类。最后,我们研究了用于评估所讨论的Hadoop调度程序的不同方法和环境。我们首先讨论Ha​​doop系统中通常存在的不同异构性因素,然后探讨在存在这种异构性的情况下设计调度程序时出现的各种挑战。之后,我们介绍了文献中可用的异构调度算法的比较研究,并根据先前所说的异构性因素对它们进行了分类。最后,我们研究了用于评估所讨论的Hadoop调度程序的不同方法和环境。我们首先讨论Ha​​doop系统中通常存在的不同异构性因素,然后探讨在存在这种异构性的情况下设计调度程序时出现的各种挑战。之后,我们介绍了文献中可用的异构调度算法的比较研究,并根据先前所说的异构性因素对它们进行了分类。最后,我们研究了用于评估所讨论的Hadoop调度程序的不同方法和环境。
更新日期:2018-06-01
down
wechat
bug