当前位置: X-MOL 学术Int. J. Coop. Inf. Syst. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
An Auto-Scaling Framework for Heterogeneous Hadoop Systems
International Journal of Cooperative Information Systems ( IF 1.5 ) Pub Date : 2017-09-05 , DOI: 10.1142/s0218843017500046
J. V. Benifa Bibal 1 , D. Dejey 1
Affiliation  

The scalability of the cloud infrastructure is essential to perform large-scale data processing using MapReduce programming model by automatically provisioning and de-provisioning the resources on demand. The existing MapReduce model shows performance degradation while getting adapted to heterogeneous environments since sufficient techniques are not available to scale the resources on demand and the scheduling algorithms would not cooperate as the resources are configured dynamically. An Auto-Scaling Framework (ASF) is presented in this article to configure the resources automatically based on the current system load in a heterogeneous Hadoop environment. The scheduling of data and task is done in a data-local manner that adapts while new resources are configured, or the existing resources are removed. A monitoring module is integrated with the JobTracker to observe the status of physical machines, compute the system load and provide automated provisioning of the resources. Then, Replica Tracker is utilized to track the replica objects for efficient scheduling of the task in the physical machines. The experiments are conducted in a commercial cloud environment using diverse workload characteristics, and the observations show that the proposed framework outperforms the existing scheduling mechanisms by the performance metrics such as average completion time, scheduling time, data locality, resource utilization and throughput.

中文翻译:

异构 Hadoop 系统的自动扩展框架

云基础设施的可扩展性对于使用 MapReduce 编程模型通过按需自动供应和取消供应资源来执行大规模数据处理至关重要。现有的 MapReduce 模型在适应异构环境时表现出性能下降,因为没有足够的技术来按需扩展资源,并且由于资源是动态配置的,因此调度算法不会协作。本文介绍了一个 Auto-Scaling Framework (ASF),用于在异构 Hadoop 环境中根据当前系统负载自动配置资源。数据和任务的调度以数据本地方式完成,在配置新资源或移除现有资源时进行调整。监控模块与 JobTracker 集成以观察物理机器的状态,计算系统负载并提供资源的自动供应。然后,Replica Tracker 用于跟踪副本对象,以有效调度物理机中的任务。实验是在使用不同工作负载特征的商业云环境中进行的,观察结果表明,所提出的框架在平均完成时间、调度时间、数据局部性、资源利用率和吞吐量等性能指标上优于现有调度机制。Replica Tracker 用于跟踪副本对象,以有效调度物理机中的任务。实验是在使用不同工作负载特征的商业云环境中进行的,观察结果表明,所提出的框架在平均完成时间、调度时间、数据局部性、资源利用率和吞吐量等性能指标上优于现有调度机制。Replica Tracker 用于跟踪副本对象,以有效调度物理机中的任务。实验是在使用不同工作负载特征的商业云环境中进行的,观察结果表明,所提出的框架在平均完成时间、调度时间、数据局部性、资源利用率和吞吐量等性能指标上优于现有调度机制。
更新日期:2017-09-05
down
wechat
bug