当前位置: X-MOL 学术J. Big Data › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
HybSMRP: a hybrid scheduling algorithm in Hadoop MapReduce framework
Journal of Big Data ( IF 8.1 ) Pub Date : 2019-11-30 , DOI: 10.1186/s40537-019-0253-9
Abolfazl Gandomi , Midia Reshadi , Ali Movaghar , Ahmad Khademzadeh

Due to the advent of new technologies, devices, and communication tools such as social networking sites, the amount of data produced by mankind is growing rapidly every year. Big data is a collection of large datasets that cannot be processed using traditional computing techniques. MapReduce has been introduced to solve large-data computational problems. It is specifically designed to run on commodity hardware, and it depends on dividing and conquering principles. Nowadays, the focus of researchers has shifted towards Hadoop MapReduce. One of the most outstanding characteristics of MapReduce is data locality-aware scheduling. Data locality-aware scheduler is a further efficient solution to optimize one or a set of performance metrics such as data locality, energy consumption and job completion time. Similar to all situations, time and scheduling are the most important aspects of the MapReduce framework. Therefore, many scheduling algorithms have been proposed in the past decades. The main ideas of these algorithms are increasing data locality rate and decreasing the response and completion time. In this paper, a new hybrid scheduling algorithm has been proposed, which uses dynamic priority and localization ID techniques and focuses on increasing data locality rate and decreasing completion time. The proposed algorithm was evaluated and compared with Hadoop default schedulers (FIFO, Fair), by running concurrent workloads consisting of Wordcount and Terasort benchmarks. The experimental results show that the proposed algorithm is faster than FIFO and Fair scheduling, achieves higher data locality rate and avoids wasting resources.

中文翻译:

HybSMRP:Hadoop MapReduce框架中的混合调度算法

由于诸如社交网站之类的新技术,设备和通信工具的出现,人类每年产生的数据量正在迅速增长。大数据是无法使用传统计算技术处理的大型数据集的集合。引入了MapReduce以解决大数据计算问题。它是专门为在商品硬件上运行而设计的,并且取决于分而治之的原则。如今,研究人员的重点已转向Hadoop MapReduce。MapReduce的最显着特征之一是可感知数据局部性的调度。数据位置感知调度器是进一步高效的解决方案,用于优化一个或一组性能指标,例如数据位置,能耗和作业完成时间。与所有情况类似,时间和调度是MapReduce框架最重要的方面。因此,在过去的几十年中已经提出了许多调度算法。这些算法的主要思想是提高数据本地化率并减少响应和完成时间。本文提出了一种新的混合调度算法,该算法利用动态优先级和本地化ID技术,着重于提高数据的本地化率和减少完成时间。通过运行包含Wordcount和Terasort基准测试的并发工作负载,对提出的算法进行了评估,并与Hadoop默认调度程序(FIFO,公平)进行了比较。实验结果表明,该算法比FIFO和Fair调度更快,数据定位率更高,避免了资源浪费。
更新日期:2019-11-30
down
wechat
bug