An Improved Mean Imputation Clustering Algorithm for Incomplete Data

Shi, Hong; Wang, Pingxin; Yang, Xin; Yu, Hualong

doi:10.1007/s11063-020-10298-5

An Improved Mean Imputation Clustering Algorithm for Incomplete Data

Published: 02 July 2020

Volume 54, pages 3537–3550, (2022)
Cite this article

Neural Processing Letters Aims and scope Submit manuscript

Hong Shi¹,
Pingxin Wang ORCID: orcid.org/0000-0002-1290-6112^2,3,
Xin Yang¹ &
…
Hualong Yu¹

610 Accesses
14 Citations
Explore all metrics

Abstract

There are many incomplete data sets in all fields of scientific studies due to random noise, data lost, limitations of data acquisition, data misunderstanding etc. Most of the clustering algorithms can not be used for incomplete data sets directly because objects with missing values need to be preprocessed. For this reason, this paper presents an improved mean imputation clustering algorithm for incomplete data based on partition clustering algorithm. In the proposed method, we divide the universe into two sets: the set of objects with non-missing values and the set of objects with missing values. Firstly, the objects with non-missing values are clustered by traditional clustering algorithm. For each object with missing values, we use the mean attribute’s value of each cluster to fill the missing attribute’s value based on the cluster results of the objects with non-missing values, respectively. Perturbation analysis of cluster centroid is applied to search the optimal imputation. The experimental clustering results on some UCI data sets are evaluated by several validity indexes, which proves the effectiveness of the proposed algorithm.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A Robust Fuzzy c-Means Clustering Algorithm for Incomplete Data

Three-way clustering method for incomplete information system based on set-pair analysis

Article 07 September 2019

A partial order framework for incomplete data clustering

Article 02 August 2022

References

Xu R, Wunsch D (2005) Survey of clustering algorithms. IEEE Trans Neural Netw 16:645–678
Article Google Scholar
Aggarwal CC, Reddy CK (2013) Data clustering: algorithms and applications. Chapman & Hall/CRC Press, Boca raton
Book Google Scholar
Wang PX, Yao YY (2018) CE3: a three-way clustering method based on mathematical morphology. Knowl-Based Syst 155:54–65
Article Google Scholar
Wang PX, Shi H, Yang XB, Mi JS (2019) Three-way k-means: integrating k-means and three-way decision. Int J Mach Learn Cybernet 10:2767–2777
Article Google Scholar
Yang XB, Qi YS, Song XN, Yang JY (2013) Test cost sensitive multigranulation rough set: model and minimal cost selection. Inf Sci 250:184–199
Article MathSciNet Google Scholar
Qian YH, Cheng HH, Wang JT, Liang JY, Pedrycz W, Dang CY (2017) Grouping granular structures in human granulation intelligence. Inf Sci 382–383:150–169
Article Google Scholar
Yang XB, Yao YY (2018) Ensemble selector for attribute reduction. Appl Soft Comput 70:1–11
Article Google Scholar
Elalami ME (2011) Supporting image retrieval framework with rule base system. Knowl-Based Syst 24:331–340
Article Google Scholar
Sebiskveradze D, Vrabie V, Gobinet C (2011) Automation of an algorithm based on fuzzy clustering for analyzing tumoral heterogeneity in human skin carcinoma tissue sections. Lab Investig 91:799–811
Article Google Scholar
Kalyani S, Swarup KS (2011) Particle swarm optimization based k-means clustering approach for security assessment in power systems. Expert Syst Appl 38:10839–10846
Article Google Scholar
Wu YH (2015) General overview on clustering algorithms. Comput Sci 42:491–499
Google Scholar
Jain AK (2008) Data clustering: 50 years beyond k-means. In: 2008 European conference on machine learning and principles and practice of knowledge discovery in databases. Springer, Berlin, pp 3–4
Macqueen J (1967) Some methods for classification and analysis of multi-variate observations. In: 1967 Proceeding of Berkeley symposium on mathematical statistics and probability conference, pp 281–297
Arthur D, Vassilvitskii S (2007) K-Means++: the advantages of careful seeding. In: ACM-SIAM symposium on discrete algorithms (SODA’07), New Orleans, LA, pp 1027–1035
Park HS, Jun CH (2009) A simple and fast algorithm for K-medoids clustering. Expert Syst Appl 36:3336–3341
Article Google Scholar
Yu SS, Chu SW, Wang CM, Chan YK, Chang TC (2018) Two improved k-means algorithms. Appl Soft Comput 68:747–755
Article Google Scholar
Franti P, Sieranoja S (2019) How much can k-means be improved by using better initialization and repeats? Pattern Recogn 93:95–112
Article Google Scholar
Chao GQ (2019) Discriminative k-means laplacian clustering. Neural Process Lett 49:393–405
Article Google Scholar
Honarkhah M, Caers J (2010) Stochastic simulation of patterns using distance-based pattern modeling. Math Geosci 42:487–517
Article Google Scholar
Rubin DB (1976) Inference and missing data. Biometrika 63:581–592
Article MathSciNet Google Scholar
Little RJA, Rubin DB (2002) Statistical analysis with missing data. Technometrics 45:364–365
MATH Google Scholar
Pang XS (2012) Comparative study on interpolation processing method of missing data. Stat Decis 24:18–22
Google Scholar
Dempster A (1977) Maximum likelihood from incomplete data via the EM algorithm. J R Stat Soc 39:1–38
MathSciNet MATH Google Scholar
Grzymala-Busse JW, Fu M (2000) A comparison of several approaches to missing attribute values in data mining. In: Proceedings of the 2nd international conference on rough sets and current trends in computing. Springer, Berlin, pp 378–385
Doquire G, Verleysen M (2012) Feature selection with missing data using mutual information estimators. Neurocomputing 90:3–11
Article Google Scholar
Jason VH, Khoshgoftaar TM (2014) Incomplete-case nearest neighbor imputation in software measurement data. Inf Sci 259:586–610
Google Scholar
Hathaway RJ, Bezdek JC (2001) Fuzzy c-means clustering of incomplete data. IEEE Trans Syst Man Cybern Part B Cybern 31:735–744
Article Google Scholar
Zhang DQ, Chen SC (2003) Clustering incomplete data using kernel-based fuzzy c-means algorithm. Neural Process Lett 18:155–162
Article Google Scholar
Li D, Gu H, Zhang LY (2010) A fuzzy c-means clustering algorithm based on nearest-neighbor intervals for incomplete data. Expert Syst Appl 37:6942–6947
Article Google Scholar
Li D, Zhang LY, Gu H (2012) An attribute weighted fuzzy c-means algorithm for incomplete data clustering. J Dalian Univ Technol 52:449–453
MathSciNet Google Scholar
Li D, Gu H, Zhang LY (2013) A hybrid genetic algorithm fuzzy c-means approach for incomplete data clustering based on nearest-neighbor intervals. Soft Comput 17:1787–1796
Article Google Scholar
Li JH, Song SJ, Zhang YL, Li K (2017) A robust fuzzy c-means clustering algorithm for incomplete data. In: 2017 International conference on life system modeling and simulation & 2017 international conference on intelligent computing for sustainable energy and environment, vol 762, pp 3–12 (2017)
Su T, Yu H (2016) Three-way decision clustering algorithm for incomplete data based on q-nearest neighbors. J Frontiers Comput Sci Technol 10:875–883
Google Scholar
Shi QY, Liang JY, Zhao XW (2016) A clustering ensemble algorithm for incomplete mixed data. J Comput Res Develop 53:1979–1989
Google Scholar
Mesquita DPP, Gomes JPP, Rodrigues LR (2019) Artificial neural networks with random weights for incomplete datasets. Neural Process Lett. https://doi.org/10.1007/s11063-019-10012-0
Article Google Scholar
Davies DL, Bouldin DW (1979) A cluster separation measure. IEEE Trans Pattern Anal Mach Intell 1:224–227
Article Google Scholar
Rousseeuw PJ (1987) Silhouettes: a graphical aid to the interpretation and validation of cluster analysis. J Comput Appl Math 20:53–65
Article Google Scholar
Bache K, Lichman M (2013) UCI machine learning repository. University of California, School of Information and Computer Science. http://archive.ics.uci.edu/ml

Download references

Acknowledgements

The authors would like to thank the editor and the anonymous reviewers for their constructive and valuable comments. This work was supported in part by National Natural Science Foundation of China (Nos. 61503160 and 61773012), Natural Science Foundation of the Jiangsu Higher Education Institutions of China (No. 15KJB110004), Postgraduate Research & Practice Innovation Program of Jiangsu Province (No. KYCX19_1699).

Author information

Authors and Affiliations

School of Computer Science, Jiangsu University of Science and Technology, Zhenjiang, 212003, People’s Republic of China
Hong Shi, Xin Yang & Hualong Yu
School of Science, Jiangsu University of Science and Technology, Zhenjiang, Jiangsu, 212003, People’s Republic of China
Pingxin Wang
College of Mathematics and Information Science, Hebei Normal University, Shijiazhuang, 050024, People’s Republic of China
Pingxin Wang

Authors

Hong Shi
View author publications
You can also search for this author in PubMed Google Scholar
Pingxin Wang
View author publications
You can also search for this author in PubMed Google Scholar
Xin Yang
View author publications
You can also search for this author in PubMed Google Scholar
Hualong Yu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Pingxin Wang.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Shi, H., Wang, P., Yang, X. et al. An Improved Mean Imputation Clustering Algorithm for Incomplete Data. Neural Process Lett 54, 3537–3550 (2022). https://doi.org/10.1007/s11063-020-10298-5

Download citation

Published: 02 July 2020
Issue Date: October 2022
DOI: https://doi.org/10.1007/s11063-020-10298-5

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

An Improved Mean Imputation Clustering Algorithm for Incomplete Data

Abstract

Access this article

Similar content being viewed by others

A Robust Fuzzy c-Means Clustering Algorithm for Incomplete Data

Three-way clustering method for incomplete information system based on set-pair analysis

A partial order framework for incomplete data clustering

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

An Improved Mean Imputation Clustering Algorithm for Incomplete Data

Abstract

Access this article

Similar content being viewed by others

A Robust Fuzzy c-Means Clustering Algorithm for Incomplete Data

Three-way clustering method for incomplete information system based on set-pair analysis

A partial order framework for incomplete data clustering

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation