当前位置: X-MOL 学术J. Cloud Comp. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Improvement of job completion time in data-intensive cloud computing applications
Journal of Cloud Computing ( IF 3.418 ) Pub Date : 2020-02-07 , DOI: 10.1186/s13677-019-0139-6
Ibrahim Adel Ibrahim , Mostafa Bassiouni

Task stragglers in MapReduce jobs dramatically impede job execution of data-intensive computing in cloud data centers. This impedance is due to the uneven distribution of input data, heterogeneous data nodes, resource contention situations, and network configurations. Data skew of intermediate data in MapReduce job causes delay failures due to the violation of job completion time. Data-intensive computing frameworks, such as MapReduce or Hadoop YARN, employ HashPartitioner. This partitioner may cause intermediate data skew, which results in straggler reducers. In this paper, we strive to make Hadoop YARN more efficient in cloud environments. We present, a new partitioning scheme, called balanced data clusters partitioner (BDCP), to handle straggler Reduce tasks based on sampling of input data and feedback information about the current processing task. Our extensive experimental results show that BDCP can outperform the default Hadoop HashPartitioner and Range partitioner. BDCP can assist in straggler mitigation during reduce phase and minimize the job completion time in MapReduce jobs within data-intensive cloud computing.

中文翻译:

缩短数据密集型云计算应用程序中的作业完成时间

MapReduce作业中的任务散乱者极大地阻碍了云数据中心中数据密集型计算的作业执行。此阻抗是由于输入数据分布不均,异构数据节点,资源争用情况和网络配置所致。由于违反作业完成时间,MapReduce作业中的中间数据的数据偏斜会导致延迟失败。数据密集型计算框架(例如MapReduce或Hadoop YARN)采用HashPartitioner。该分区程序可能会导致中间数据偏斜,从而导致散乱的减速器。在本文中,我们努力使Hadoop YARN在云环境中更加高效。我们提出了一种新的分区方案,称为平衡数据集群分区器(BDCP),根据输入数据的采样和有关当前处理任务的反馈信息来处理散乱的约简任务。我们广泛的实验结果表明,BDCP的性能优于默认的Hadoop HashPartitioner和Range分区器。BDCP可以帮助减少还原阶段的混乱现象,并在数据密集型云计算中的MapReduce作业中最大程度地缩短作业完成时间。
更新日期:2020-04-16
down
wechat
bug