STDADS: An Efficient Slow Task Detection Algorithm for Deadline Schedulers.,Big Data

当前位置： X-MOL 学术 › Big Data › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

STDADS: An Efficient Slow Task Detection Algorithm for Deadline Schedulers.
Big Data ( IF 2.6 ) Pub Date : 2020-02-01 , DOI: 10.1089/big.2019.0039
Utsav Upadhyay ₁ , Geeta Sikka ₁

Affiliation

The MapReduce programming model was designed and developed for Google File System to efficiently process large-scale distributed data sets. The open source implementation of this Google project was called the Apache Hadoop. Hadoop architecture includes Hadoop MapReduce and Hadoop Distributed File System (HDFS). HDFS supports Hadoop in effectively managing data sets over the cluster and MapReduce programming paradigm helps in the efficient processing of large data sets. MapReduce strategically re-executes a speculative task on some other node to finish the computation quickly, enhancing the overall Quality of Service (QoS). Several mechanisms were suggested over the Hadoop's Default Scheduler to improve the speculative task execution over Hadoop cluster. A large number of strategies were also suggested for scheduling jobs with deadlines. The mechanisms for speculative task execution were not developed for or were not well integrated with Deadline Schedulers. This article presents an improved speculative task detection algorithm, designed specifically for Deadline Scheduler. Our studies suggest the importance of keeping a regular track of node's performance to re-execute the speculative tasks more efficiently. We have successfully improved the QoS offered by Hadoop clusters over the jobs arriving with deadlines in terms of the percentage of successfully completed jobs, the detection time of speculative tasks, the accuracy of correct speculative task detection, and the percentage of incorrectly fagged speculative tasks.

中文翻译：

STDADS：截止日期调度程序的高效慢任务检测算法。

为Google文件系统设计和开发了MapReduce编程模型，以有效处理大规模分布式数据集。这个Google项目的开源实现称为Apache Hadoop。Hadoop体系结构包括Hadoop MapReduce和Hadoop分布式文件系统（HDFS）。HDFS支持Hadoop来有效管理整个集群上的数据集，而MapReduce编程范例有助于有效处理大型数据集。MapReduce从战略上在其他某个节点上重新执行推测性任务以快速完成计算，从而提高了整体服务质量（QoS）。在Hadoop的默认调度程序上提出了几种机制，以改善Hadoop集群上的推测性任务执行。还建议了许多策略来计划有期限的工作。投机任务执行的机制不是为截止时间调度程序开发的，或者没有与截止时间调度程序很好地集成。本文提出了一种改进的推测性任务检测算法，专门为Deadline Scheduler设计。我们的研究表明保持定期跟踪节点性能以更有效地重新执行投机任务的重要性。在成功完成的作业的百分比，推测性任务的检测时间，正确的推测性任务检测的准确性以及标记错误的推测性任务的百分比方面，我们已经成功地改善了Hadoop集群在截止日期之前到达的作业所提供的QoS。本文提出了一种改进的推测性任务检测算法，专门为Deadline Scheduler设计。我们的研究表明保持定期跟踪节点性能以更有效地重新执行投机任务的重要性。在成功完成的作业的百分比，推测性任务的检测时间，正确的推测性任务检测的准确性以及标记错误的推测性任务的百分比方面，我们已经成功地改善了Hadoop集群在截止日期之前到达的作业所提供的QoS。本文提出了一种改进的推测性任务检测算法，专门为Deadline Scheduler设计。我们的研究表明保持定期跟踪节点性能以更有效地重新执行投机任务的重要性。在成功完成的作业的百分比，推测性任务的检测时间，正确的推测性任务检测的准确性以及标记错误的推测性任务的百分比方面，我们已经成功地改善了Hadoop集群在截止日期之前到达的作业所提供的QoS。

更新日期：2020-02-01

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南11