当前位置: X-MOL 学术J. Big Data › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
On using MapReduce to scale algorithms for Big Data analytics: a case study
Journal of Big Data ( IF 8.6 ) Pub Date : 2019-11-30 , DOI: 10.1186/s40537-019-0269-1
Phongphun Kijsanayothin , Gantaphon Chalumporn , Rattikorn Hewett

Introduction

Many data analytics algorithms are originally designed for in-memory data. Parallel and distributed computing is a natural first remedy to scale these algorithms to “Big algorithms” for large-scale data. Advances in many Big Data analytics algorithms are contributed by MapReduce, a programming paradigm that enables parallel and distributed execution of massive data processing on large clusters of machines. Much research has focused on building efficient naive MapReduce-based algorithms or extending MapReduce mechanisms to enhance performance. However, we argue that these should not be the only research directions to pursue. We conjecture that when naive MapReduce-based solutions do not perform well, it could be because certain classes of algorithms are not amendable to MapReduce model and one should find a fundamentally different approach to a new MapReduce-based solution.

Case description

This paper investigates a case study of a scaling problem of “Big algorithms” for a popular association rule-mining algorithm, particularly the development of Apriori algorithm in MapReduce model.

Discussion and evaluation

Formal and empirical illustrations are explored to compare our proposed MapReduce-based Apriori algorithm with previous solutions. The findings support our conjecture and our study shows promising results compared to the state-of-the-art performer with 7% increase in performance on the average of transactions ranging from 10,000 to 120,000.

Conclusions

The results confirm that effective MapReduce implementation should avoid dependent iterations, such as that of the original sequential Apriori algorithm. These findings could lead to many more alternative non-naive MapReduce-based “Big algorithms”.


中文翻译:

关于使用MapReduce扩展大数据分析算法的案例研究

介绍

许多数据分析算法最初是为内存中数据设计的。并行和分布式计算是将这些算法扩展为大规模数据的“大算法”的自然的第一个补救方法。许多大数据分析算法的进步是由MapReduce贡献的,MapReduce是一种编程范例,可在大型机器集群上并行和分布式执行海量数据处理。许多研究都集中在构建基于有效的朴素的基于MapReduce的算法或扩展MapReduce机制以提高性能上。但是,我们认为这些不应成为唯一的研究方向。我们推测,当基于朴素的MapReduce的解决方案不能很好地执行时,

案例说明

本文研究了一种流行的关联规则挖掘算法的“大算法”缩放问题的案例研究,特别是MapReduce模型中Apriori算法的开发。

讨论与评估

探索了形式和经验说明,以将我们提出的基于MapReduce的Apriori算法与以前的解决方案进行比较。这些发现支持了我们的猜想,并且我们的研究显示,与最先进的执行者相比,结果令人鼓舞,平均交易量在10,000到120,000之间,性能提高了7%。

结论

结果证实,有效的MapReduce实现应避免依赖的迭代,例如原始顺序Apriori算法的迭代。这些发现可能会导致更多替代非基于天真的MapReduce的“大算法”。
更新日期:2019-11-30
down
wechat
bug