当前位置:
X-MOL 学术
›
J. Big Data
›
论文详情
Our official English website, www.x-mol.net, welcomes your
feedback! (Note: you will need to create a separate account there.)
On using MapReduce to scale algorithms for Big Data analytics: a case study
Journal of Big Data ( IF 8.6 ) Pub Date : 2019-11-30 , DOI: 10.1186/s40537-019-0269-1 Phongphun Kijsanayothin , Gantaphon Chalumporn , Rattikorn Hewett
中文翻译:
关于使用MapReduce扩展大数据分析算法的案例研究
更新日期:2019-11-30
Journal of Big Data ( IF 8.6 ) Pub Date : 2019-11-30 , DOI: 10.1186/s40537-019-0269-1 Phongphun Kijsanayothin , Gantaphon Chalumporn , Rattikorn Hewett
Introduction
Many data analytics algorithms are originally designed for in-memory data. Parallel and distributed computing is a natural first remedy to scale these algorithms to “Big algorithms” for large-scale data. Advances in many Big Data analytics algorithms are contributed by MapReduce, a programming paradigm that enables parallel and distributed execution of massive data processing on large clusters of machines. Much research has focused on building efficient naive MapReduce-based algorithms or extending MapReduce mechanisms to enhance performance. However, we argue that these should not be the only research directions to pursue. We conjecture that when naive MapReduce-based solutions do not perform well, it could be because certain classes of algorithms are not amendable to MapReduce model and one should find a fundamentally different approach to a new MapReduce-based solution.Case description
This paper investigates a case study of a scaling problem of “Big algorithms” for a popular association rule-mining algorithm, particularly the development of Apriori algorithm in MapReduce model.Discussion and evaluation
Formal and empirical illustrations are explored to compare our proposed MapReduce-based Apriori algorithm with previous solutions. The findings support our conjecture and our study shows promising results compared to the state-of-the-art performer with 7% increase in performance on the average of transactions ranging from 10,000 to 120,000.Conclusions
The results confirm that effective MapReduce implementation should avoid dependent iterations, such as that of the original sequential Apriori algorithm. These findings could lead to many more alternative non-naive MapReduce-based “Big algorithms”.中文翻译:
关于使用MapReduce扩展大数据分析算法的案例研究