当前位置: X-MOL 学术J. Circuits Syst. Comput. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Application and Storage-Aware Data Placement and Job Scheduling for Hadoop Clusters
Journal of Circuits, Systems and Computers ( IF 0.9 ) Pub Date : 2020-12-05 , DOI: 10.1142/s0218126620502540
Tao Li 1 , Shuibing He 2 , Ping Chen 2 , Siling Yang 2 , Yanlong Yin 3 , Cheng Xu 1
Affiliation  

As one of the most popular frameworks for large-scale analytics processing, Hadoop is facing two challenges: both applications and storage devices become heterogeneous. However, existing data placement and job scheduling schemes pay little attention to such heterogeneity of either application I/O requirements or I/O device capability, thus can greatly degrade system efficiencies. In this paper, we propose ASPS, an Application and Storage-aware data Placement and job Scheduling approach for Hadoop clusters. The idea is to place application data and schedule application tasks considering both application I/O requirements and storage device characteristics. Specifically, ASPS first introduces novel metrics to quantify I/O requirements of applications. Then, based on the quantification, ASPS places data of different applications to the preferred storage devices. Finally, ASPS tries to launch jobs with high I/O requirements on the nodes with the same type of faster devices to improve system efficiency. We have implemented ASPS in Hadoop framework. Experimental results show that ASPS can reduce the completion time of a single application by up to 36% and the average completion time of six concurrent applications by 27%, compared to existing data placement policies and job scheduling approaches.

中文翻译:

Hadoop 集群的应用程序和存储感知数据放置和作业调度

作为最流行的大规模分析处理框架之一,Hadoop 面临两个挑战:应用程序和存储设备都变得异构。然而,现有的数据放置和作业调度方案很少关注应用程序 I/O 要求或 I/O 设备能力的这种异构性,因此会大大降低系统效率。在本文中,我们提出了 ASPS,一种适用于 Hadoop 集群的应用程序和存储感知数据放置和作业调度方法。这个想法是考虑应用程序 I/O 要求和存储设备特性来放置应用程序数据并安排应用程序任务。具体来说,ASPS 首先引入了新的指标来量化应用程序的 I/O 需求。然后,根据量化,ASPS 将不同应用程序的数据放置到首选存储设备中。最后,ASPS 尝试在具有相同类型更快设备的节点上启动 I/O 要求高的作业,以提高系统效率。我们已经在 Hadoop 框架中实现了 ASPS。实验结果表明,与现有的数据放置策略和作业调度方法相比,ASPS 可以将单个应用程序的完成时间减少高达 36%,将六个并发应用程序的平均完成时间减少 27%。
更新日期:2020-12-05
down
wechat
bug