当前位置: X-MOL 学术ETRI J. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Task failure resilience technique for improving the performance of MapReduce in Hadoop
ETRI Journal ( IF 1.3 ) Pub Date : 2020-08-18 , DOI: 10.4218/etrij.2018-0265
Kavitha C 1 , Anita X 2
Affiliation  

MapReduce is a framework that can process huge datasets in parallel and distributed computing environments. However, a single machine failure during the runtime of MapReduce tasks can increase completion time by 50%. MapReduce handles task failures by restarting the failed task and re‐computing all input data from scratch, regardless of how much data had already been processed. To solve this issue, we need the computed key‐value pairs to persist in a storage system to avoid re‐computing them during the restarting process. In this paper, the task failure resilience (TFR) technique is proposed, which allows the execution of a failed task to continue from the point it was interrupted without having to redo all the work. Amazon ElastiCache for Redis is used as a non‐volatile cache for the key‐value pairs. We measured the performance of TFR by running different Hadoop benchmarking suites. TFR was implemented using the Hadoop software framework, and the experimental results showed significant performance improvements when compared with the performance of the default Hadoop implementation.

中文翻译:

任务失败弹性技术,用于提高Hadoop中的MapReduce性能

MapReduce是一个可以在并行和分布式计算环境中处理庞大数据集的框架。但是,在MapReduce任务的运行期间单机故障可能会使完成时间增加50%。MapReduce通过重新启动失败的任务并从头开始重新计算所有输入数据来处理任务失败,而不管已经处理了多少数据。为了解决此问题,我们需要将计算出的键值对保留在存储系统中,以避免在重新启动过程中重新计算它们。在本文中,提出了任务失败复原力(TFR)技术,该技术可以使失败的任务从中断点继续执行,而不必重做所有工作。Amazon ElastiCache for Redis用作键-值对的非易失性缓存。我们通过运行不同的Hadoop基准测试套件来衡量TFR的性能。TFR是使用Hadoop软件框架实现的,与默认Hadoop实现的性能相比,实验结果显示出显着的性能改进。
更新日期:2020-08-18
down
wechat
bug