Abstract
Fuzzy C-means clustering integration algorithm is a method to improve clustering quality by using integration ideas, but as the amount of data increases, its time complexity increases. A parallel FCM clustering integration algorithm based on MapReduce is proposed. The algorithm uses a random initial clustering centre to obtain differentiated cluster members. By establishing an overlapping matrix between clusters, the clustering labels are unified to find logical equivalence clusters. The cluster members share the classification information of the data objects by voting to obtain the final clustering result. The experimental results show that the parallel FCM clustering integration algorithm has good performance, and has high speedup and good scalability.
Similar content being viewed by others
References
Couceiro, M., Sivasundaram, S.: Novel fractional-order particle swarm optimization. Appl. Math. Comput. 283, 36–54 (2016)
Tamerabet, Y., Adjadj, F., Bentrcia, T.: Evaluation of the genetic algorithm performance for the optimization of the grand potential in the cluster variation method. CALPHAD 61, 157–164 (2018)
Gao, T., Li, A., Meng, F.: Research on data stream clustering based on fcm algorithm1. Procedia Comput. Sci. 122, 595–602 (2017)
Li, F., Qian, Y., Wang, J., Liang, J.: Multigranulation information fusion: a Dempster–Shafer evidence theory-based clustering ensemble method. Inf. Sci. 378, 389–409 (2017)
Syakur, M.A., Khotimah, B.K., Rochman, E.M.S., Satoto, B.D.: Integration k-means clustering method and elbow method for identification of the best customer profile cluster. In: IOP Conference Series: Materials Science and Engineering, vol. 336(1), p. 012017). IOP Publishing, Bristol (2018).
Anagnostopoulos, I., Zeadally, S., Exposito, E.: Handling big data: research challenges and future directions. J. Supercomput. 72(4), 1494–1516 (2016)
Xu, D., Tian, Y.: A comprehensive survey of clustering algorithms. Ann. Data Sci. 2(2), 165–193 (2015)
Glushkova, D., Jovanovic, P., Abelló, A.: Mapreduce performance model for Hadoop 2.x. Inf. Syst. 79, 32–43 (2019). Special issue on DOLAP 2017: Design, Optimization, Languages and Analytical Processing of Big Data.
Kesemen, O., Tezel, Ö., Özkul, E.: Fuzzy c-means clustering algorithm for directional data (FCM4DD). Expert Syst. Appl. 58, 76–82 (2016)
Ruspini, E.H., Bezdek, J.C., Keller, J.M.: Fuzzy clustering: a historical perspective. IEEE Comput. Intell. Mag. 14(1), 45–55 (2019)
Yu, Q., Ding, Z.: An improved fuzzy C-means algorithm based on MapReduce. In: 2015 8th International Conference on Biomedical Engineering and Informatics (BMEI), pp. 634–638. IEEE, Shenzhen (2015).
Maitrey, S., Jha, C.K.: MapReduce: simplified data analysis of big data. Procedia Comput. Sci, 57, 563–571 (2015)
Jung, Y.G., Kang, M.S., Heo, J.: Clustering performance comparison using K-means and expectation-maximization algorithms. Biotechnol. Biotechnol. Equip. 28(sup1), S44–S48 (2014)
Bhavani, R., Sudha Sadasivam, G.: 8 parallel data mining. In: Medical Big Data and Internet of Medical Things: Advances, Challenges and Applications (2018)
Ludwig, S.A.: MapReduce-based fuzzy c-means clustering algorithm: implementation and scalability. Int. J. Mach. Learn. Cybern. 6(6), 923–934 (2015)
Jin, S., Cui, Y., Yu, C.: A new parallelization method for K-means. arXiv preprint. arXiv:1608.06347 (2016).
Yu, Q., Ding, Z.:. An improved fuzzy C-Means algorithm based on MapReduce. In: 2015 8th International Conference on Biomedical Engineering and Informatics (BMEI), October 2015, pp. 634–638). IEEE, Shenzhen (2015).
Sardar, T.H., Ansari, Z.: An analysis of MapReduce efficiency in document clustering using parallel K-means algorithm. Fut. Comput. Inf. J. 3(2), 200–209 (2018)
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Mbyamm Kiki, M.J., Zhang, J. & Kouassi, B.A. MapReduce FCM clustering set algorithm. Cluster Comput 24, 489–500 (2021). https://doi.org/10.1007/s10586-020-03131-0
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10586-020-03131-0