Distributed synthesized association mining for big transactional data

Pal, Amrit; Kumar, Manish

doi:10.1007/s12046-020-01380-8

Distributed synthesized association mining for big transactional data

Published: 02 July 2020

Volume 45, article number 169, (2020)
Cite this article

Sādhanā Aims and scope Submit manuscript

128 Accesses
2 Citations
Explore all metrics

Abstract

Data is increasing rapidly day by day along with the transactional database. Dividing this data and storing it in a distributed manner is an effective way for storage and retrieval. Mining such distributed data with minimum dependence between sub-problems is a crucial task. Finding frequent itemsets and corresponding association rules is a big challenge while considering the aggregation in a distributed environment. To overcome these challenges, we propose a distributed frequent itemset generation and association rule mining algorithm using MapReduce programming model. The proposed scheme generates frequent itemset and mine association rules using a synthesized distributed technique. The rules are mined in a distributed manner, and then weights are assigned to subsets of data and association rules. A proper mixture of association rules that are generated in distributed manner is done using a weighted approach. This paper presents a novel MapReduce-based synthesis approach, which can work well over a distributed storage of large amount of data.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Kavosh: an effective Map-Reduce-based association rule mining method

Article Open access 14 July 2018

Mohammadhossein Barkhordari & Mahdi Niamanesh

A System that Performs Data Distribution and Manages Frequent Itemsets Generation of Incremental Data in a Distributed Environment

Optimizing the Data-Process Relationship for Fast Mining of Frequent Itemsets in MapReduce

References

Wu X, Zhu X, Wu G Q and Ding W 2014 Data mining with big data. IEEE Transactions on Knowledge and Data Engineering 26(1): 97–107
Article Google Scholar
DeMers, J 2015 Why Facebook is making big data available to select partners. Forbes, retrieved from http://www.forbes.com/sites/jaysondemers/2015/03/25/why-facebook-is-making-big-data-available-to-select-partners/#24f4d0422966
Turner V 2014 The digital universe of opportunities: rich data and the increasing value of the Internet of things. Retrieved October 26, 2016, from http://www.emc.com/leadership/digital-universe/2014iview/executive-summary.htm
Laney D 2001 3D Data management: controlling data volume, velocity and variety. META Group Research Note 6, 70
Google Scholar
Fan W and Bifet A 2013 Mining Big Data: current status, and forecast to the future. ACM SIGKDD Explorations Newsletter 14(2): 1–5
Article Google Scholar
Rashid M M, Gondal I and Kamruzzaman J 2017 Dependable large scale behavioral patterns mining from sensor data using Hadoop platform. Information Sciences 379: 128–145
Article Google Scholar
Anitha R, Mukherjee S 2015 MaaS: fast retrieval of data in cloud using metadata as a service. Arabian Journal for Science and Engineering 40(8): 2323–2343
Article Google Scholar
Hipp J, Güntzer U and Nakhaeizadeh G 2000 Algorithms for association rule mining: a general survey and comparison. ACM SIGKDD Explorations Newsletter 2(1): 58-64.
Article Google Scholar
Seol W S, Jeong H W, Lee B and Youn H Y 2013 Reduction of association rules for Big Data sets in socially-aware computing. In: Proceedings of the 16th IEEE International Conference on Computational Science and Engineering (CSE), pp. 949–956
Han J 2005 Data mining: concepts and techniques. San Francisco, CA, USA: Morgan Kaufmann Publishers Inc.
Google Scholar
Agrawal R and Srikant R 1994 Fast algorithms for mining association rules. In: Proceedings of the 20th International Conference on Very Large Data Bases, VLDB 1215, pp. 487–499
Han J, Pei J, Yin Y and Mao R 2004 Mining frequent patterns without candidate generation: a frequent-pattern tree approach Data Mining and Knowledge Discovery 8(1): 53–87
Article MathSciNet Google Scholar
Ordonez C, Mohanam N, Garcia-Alvarado C 2014 PCA for large data sets with parallel data summarization. Distributed and Parallel Databases 32(3): 377–403
Article Google Scholar
Dean J, Ghemawat S 2008 MapReduce: simplified data processing on large clusters. Communications of the ACM 51(1): 107–113
Article Google Scholar
Agrawal D, Das S, El Abbadi A 2011 Big data and cloud computing: current state and future opportunities. In: Proceedings of the 14th International Conference on Extending Database Technology, 530–533
Agrawal R, Shafer J C 1996 Parallel mining of association rules: Design, implementation, and experience IBM Thomas J. Watson Research Division
Yang X Y, Liu Z, Fu Y 2010 MapReduce as a programming model for association rules algorithm on Hadoop. In: Proceedings of the 3rd International Conference on Information Sciences and Interaction Sciences (ICIS), pp. 99–102
Lin M Y, Lee P Y, Hsueh S C 2012 Apriori-based frequent itemset mining algorithms on MapReduce. In: Proceedings of the 6th International Conference on Ubiquitous Information Management and Communication, p. 76
Chang X Z MapReduce-Apriori algorithm under cloud computing environment. In: Proceedings of the International Conference on Machine Learning and Cybernetics (ICMLC), vol. 2, pp. 637–641
Lin X 2014 MR-apriori: association rules algorithm based on MapReduce. In: Proceedings of the 5th IEEE International Conference on Software Engineering and Service Science (ICSESS), pp. 141–144
Li N, Zeng L, He Q, Shi Z 2012 Parallel implementation of apriori algorithm based on MapReduce. In: Proceedings of the 13th ACIS International Conference on Software Engineering, Artificial Intelligence, Networking and Parallel & Distributed Computing (SNPD), pp. 236–241
Guo J, Ren Y G 2013 Research on improved A Priori algorithm based on coding and MapReduce. In: Proceedings of the 10th Conference on Web Information System and Application (WISA), pp. 294–299
Li H, Wang Y, Zhang D, Zhang M and Chang E Y 2008 Pfp: parallel fp-growth for query recommendation. In: Proceedings of the 2008 ACM Conference on Recommender Systems, pp. 107–114
Xun Y, Zhang J and Qin X 2016 Fidoop: parallel mining of frequent itemsets using MapReduce. IEEE Transactions on Systems, Man, and Cybernetics: Systems 46(3): 313–325
Article Google Scholar
Riondato M, DeBrabant J A, Fonseca R and Upfal E 2012 PARMA: a parallel randomized algorithm for approximate association rules mining in MapReduce. In: Proceedings of the 21st ACM International Conference on Information and Knowledge Management, October, pp. 85–94
Morales G D F and Bifet A 2015 SAMOA: scalable advanced massive online analysis. Journal of Machine Learning Research 16(1): 149–153
Google Scholar
Holt J D and Chung S M 2007 Parallel mining of association rules from text databases. The Journal of Supercomputing 39(3): 273–299
Article Google Scholar
Shvachko K, Kuang H, Radia S and Chansler R 2010 The Hadoop distributed file system. In: Proceedings of the 26th IEEE Symposium on Mass Storage Systems and Technologies (MSST), pp. 1–10
Javed A and Khokhar A 2004 Frequent pattern mining on message passing multiprocessor systems. Distributed and Parallel Databases 16(3): 321–334
Article Google Scholar
Wu X, Zhang S 2003 Synthesizing high-frequency rules from different data sources. IEEE Transactions on Knowledge and Data Engineering 15(2): 353–367
Article Google Scholar
Friedman J, Hastie T, Tibshirani R 2001 The elements of statistical learning. In: Springer Series in Statistics, vol. 1. Berlin: Springer
MATH Google Scholar
Fournier-Viger P 2008 SPMF: a Java open-source data mining library. Retrieved on October 30, 2016, from http://www.philippe-fournier-viger.com/spmf/index.php?link=datasets.php
Fournier-Viger P, Gomariz Gueniche T A, Soltani A, Wu C and Tseng V S 2014 SPMF: a Java open-source pattern mining library. Journal of Machine Learning Research 15: 3389–3393
MATH Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Engineering and Application, GLA University, Mathura, India
Amrit Pal
Department of Information Technology, Indian Institute of Information Technology Allahabad, Prayagraj, India
Amrit Pal & Manish Kumar

Authors

Amrit Pal
View author publications
You can also search for this author in PubMed Google Scholar
Manish Kumar
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Amrit Pal.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Pal, A., Kumar, M. Distributed synthesized association mining for big transactional data. Sādhanā 45, 169 (2020). https://doi.org/10.1007/s12046-020-01380-8

Download citation

Received: 16 November 2017
Accepted: 15 May 2020
Published: 02 July 2020
DOI: https://doi.org/10.1007/s12046-020-01380-8

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Distributed synthesized association mining for big transactional data

Abstract

Access this article

Similar content being viewed by others

Kavosh: an effective Map-Reduce-based association rule mining method

A System that Performs Data Distribution and Manages Frequent Itemsets Generation of Incremental Data in a Distributed Environment

Optimizing the Data-Process Relationship for Fast Mining of Frequent Itemsets in MapReduce

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Distributed synthesized association mining for big transactional data

Abstract

Access this article

Similar content being viewed by others

Kavosh: an effective Map-Reduce-based association rule mining method

A System that Performs Data Distribution and Manages Frequent Itemsets Generation of Incremental Data in a Distributed Environment

Optimizing the Data-Process Relationship for Fast Mining of Frequent Itemsets in MapReduce

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation