当前位置: X-MOL 学术Int. J. Parallel. Program › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
MapReduce Data Skewness Handling: A Systematic Literature Review
International Journal of Parallel Programming ( IF 0.9 ) Pub Date : 2019-01-23 , DOI: 10.1007/s10766-019-00627-0
Mohammad Amin Irandoost , Amir Masoud Rahmani , Saeed Setayeshi

One of the most successful techniques in large-scale data-intensive computations is MapReduce programming. MapReduce is based on a divide and conquer approach that uses commodity computers, also known as nodes, for parallel processing. The scalability and performance of this technique are more related to the type of data distribution in map and reduce tasks. Because of many reasons such as node failure, network failure, data skewness, etc. completion time of one task could be longer than other tasks, job completion time is determined by the slowest task. One of the most important reasons for requiring more time to finish one task compared to other tasks is the skewness of data. Despite the widespread use of MapReduce because of its high flexibility and tolerability of the error, with the presence of data skewness, it cannot fully utilize the nodes for parallel processing. The objectives of this study were to review related articles and classify them based on the type of problem addressed and to determine the advantages and disadvantages of them. Open issues were also defined to present guidelines for future research on this subject. In order to achieve the aforementioned objectives, some research questions were defined and answered. In this review, it was concluded that there are important parameters have not been considered in MapReduce data skewness handling approaches.

中文翻译:

MapReduce 数据偏度处理:系统文献综述

大规模数据密集型计算中最成功的技术之一是 MapReduce 编程。MapReduce 基于分而治之的方法,该方法使用商品计算机(也称为节点)进行并行处理。这种技术的可扩展性和性能更多地与 map 和 reduce 任务中的数据分布类型相关。由于节点故障、网络故障、数据偏斜等多种原因,一个任务的完成时间可能比其他任务的完成时间长,作业完成时间由最慢的任务决定。与其他任务相比,完成一项任务需要更多时间的最重要原因之一是数据的偏度。尽管MapReduce因其高度的灵活性和容错性而被广泛使用,同时存在数据偏斜,它不能充分利用节点进行并行处理。本研究的目的是回顾相关文章,并根据所解决问题的类型对它们进行分类,并确定它们的优缺点。还定义了未解决的问题,以提供有关该主题的未来研究的指南。为了实现上述目标,定义并回答了一些研究问题。在本次审查中,得出的结论是 MapReduce 数据偏度处理方法中没有考虑到一些重要的参数。为了实现上述目标,定义并回答了一些研究问题。在本次审查中,得出的结论是 MapReduce 数据偏度处理方法中没有考虑到一些重要的参数。为了实现上述目标,定义并回答了一些研究问题。在本次审查中,得出的结论是 MapReduce 数据偏度处理方法中没有考虑到一些重要的参数。
更新日期:2019-01-23
down
wechat
bug