Skip to main content
Log in

Enhancing data analysis: uncertainty-resistance method for handling incomplete data

  • Published:
Applied Intelligence Aims and scope Submit manuscript

Abstract

In data analysis, incomplete data commonly occurs and can have significant effects on the conclusions that can be drawn from the data. Incomplete data cause another problem, so-called uncertainty which leads to producing unreliable results. Hence, developing effective techniques to impute these missing values is crucial. Missing or incomplete data and noise are two common sources of uncertainty. In this paper, an effective method for imputing missing values is introduced which is robust to uncertainties that are arising from incompleteness and noise. A kernel-based method for removing the noise is designed. Using the belief function theory, the class of incomplete data is determined. Finally, every missing dimension is imputed considering the mean value of the same dimension of the members belonging to the determined class. The performance has been evaluated on real-world data sets from UCI repository. The results of the experiments have been compared with state-of-the-art methods, which show the superiority of the proposed method regarding classification accuracy.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1

Similar content being viewed by others

References

  1. Roshanbin N, Miller J (2016) A comparative study of the performance of local feature-based pattern recognition algorithms. Pattern Anal Applic:1–12. https://doi.org/10.1007/s10044-016-0554-y

    Article  MathSciNet  Google Scholar 

  2. Little RJ, Rubin DB (1987) Statistical Analysis with Missing Data. John A Wiley & Sons, Inc, New York

    MATH  Google Scholar 

  3. Cleophas TJ, Zwinderman AH (2016) Missing data imputation. In: Clinical Data Analysis on a Pocket Calculator. Springer, pp 93–97

  4. Playle R, Coulman E, Gallagher D, Simpson S (2015) The use of multiple imputation (MI) in cluster randomised trials with suspected missing not at random (MNAR) outcome. Trials 16(S2):P143

    Article  Google Scholar 

  5. Hamidzadeh J, Moradi M (2018) Improved one-class classification using filled function. Appl Intell:1–17

  6. Zahin SA, Ahmed CF, Alam T (2018) An effective method for classification with missing values. Appl Intell:1–22

  7. Pan R, Yang T, Cao J, Lu K, Zhang Z (2015) Missing data imputation by K nearest neighbours based on grey relational structure and mutual information. Appl Intell 43(3):614–632

    Article  Google Scholar 

  8. Zhu B, He C, Liatsis P (2012) A robust missing value imputation method for noisy data. Appl Intell 36(1):61–74

    Article  Google Scholar 

  9. Donner A (1982) The relative effectiveness of procedures commonly used in multiple regression analysis for dealing with missing values. Am Stat 36(4):378–381

    Google Scholar 

  10. Dempster AP, Laird NM, Rubin DB (1977) Maximum likelihood from incomplete data via the EM algorithm. J R Stat Soc Ser B Methodol:1–38

    MathSciNet  MATH  Google Scholar 

  11. Silva-Ramírez E-L, Pino-Mejías R, López-Coello M, Cubiles-de-la-Vega M-D (2011) Missing value imputation on missing completely at random data using multilayer perceptrons. Neural Netw 24(1):121–129

    Article  Google Scholar 

  12. van Stein B, Kowalczyk W (2016) An incremental algorithm for repairing training sets with missing values. In: International Conference on Information Processing and Management of Uncertainty in Knowledge-Based Systems. Springer, pp 175–186

  13. Beyad Y, Maeder M (2013) Multivariate linear regression with missing values. Anal Chim Acta 796:38–41

    Article  Google Scholar 

  14. Che Z, Purushotham S, Cho K, Sontag D, Liu Y (2018) Recurrent neural networks for multivariate time series with missing values. Sci Rep 8(1):6085

    Article  Google Scholar 

  15. Van Hulse J, Khoshgoftaar TM (2014) Incomplete-case nearest neighbor imputation in software measurement data. Inf Sci 259:596–610

    Article  Google Scholar 

  16. Lee M, Rahbar MH, Brown M, Gensler L, Weisman M, Diekman L, Reveille JD (2018) A multiple imputation method based on weighted quantile regression models for longitudinal censored biomarker data with missing values at early visits. BMC Med Res Methodol 18(1):8

    Article  Google Scholar 

  17. Shafer G (1976) A mathematical theory of evidence, vol 1. Princeton University Press, Princeton

    MATH  Google Scholar 

  18. Huang S, Su X, Hu Y, Mahadevan S, Deng Y (2014) A new decision-making method by incomplete preferences based on evidence distance. Knowl-Based Syst 56:264–272

    Article  Google Scholar 

  19. Han D, Deng Y, Han C (2013) Sequential weighted combination for unreliable evidence based on evidence variance. Decis Support Syst 56:387–393

    Article  Google Scholar 

  20. Deng X, Hu Y, Chan FT, Mahadevan S, Deng Y (2015) Parameter estimation based on interval-valued belief structures. Eur J Oper Res 241(2):579–582

    Article  MathSciNet  Google Scholar 

  21. Liu Z-G, Pan Q, Mercier G, Dezert J (2015) A new incomplete pattern classification method based on evidential reasoning. IEEE Transactions on Cybernetics 45(4):635–646

    Article  Google Scholar 

  22. Denoeux T (2013) Maximum likelihood estimation from uncertain data in the belief function framework. IEEE Trans Knowl Data Eng 25(1):119–130

    Article  Google Scholar 

  23. Zhou K, Martin A, Pan Q, Z-g L (2015) Median evidential c-means algorithm and its application to community detection. Knowl-Based Syst 74:69–88

    Article  Google Scholar 

  24. Denœux T, Masson M-H (2004) EVCLUS: evidential clustering of proximity data. IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics) 34(1):95–109

    Article  Google Scholar 

  25. Masson M-H, Denoeux T (2008) ECM: An evidential version of the fuzzy c-means algorithm. Pattern Recogn 41(4):1384–1397

    Article  Google Scholar 

  26. Hamidzadeh J, Namaei N (2018) Belief-based chaotic algorithm for support vector data description. Soft Comput:1–26

  27. Hamidzadeh J, Moslemnejad S (2018) Identification of uncertainty and decision boundary for SVM classification training using belief function. Appl Intell:1–16

  28. Zhang S (2008) Parimputation: From imputation and null-imputation to partially imputation. IEEE Intelligent Informatics Bulletin 9:32–38

    Google Scholar 

  29. Zhang L, Bing Z, Zhang L (2015) A hybrid clustering algorithm based on missing attribute interval estimation for incomplete data. Pattern Anal Applic 18(2):377–384

    Article  MathSciNet  Google Scholar 

  30. Tian J, Yu B, Yu D, Ma S (2014) Missing data analyses: a hybrid multiple imputation algorithm using Gray System Theory and entropy based on clustering. Appl Intell 40(2):376–388

    Article  Google Scholar 

  31. Smets P (1990) The combination of evidence in the transferable belief model. IEEE Trans Pattern Anal Mach Intell 12(5):447–458

    Article  Google Scholar 

  32. Smarandache F, Dezert J (2015) Advances and Applications of DSmT for Information Fusion, Vol. IV: Collected Works. Infinite Study

  33. Li T, Zhang L, Lu W, Hou H, Liu X, Pedrycz W, Zhong C (2017) Interval kernel Fuzzy C-Means clustering of incomplete data. Neurocomputing 237:316–331. https://doi.org/10.1016/j.neucom.2017.01.017

    Article  Google Scholar 

  34. Li D, Gu H, Zhang L (2013) A hybrid genetic algorithm–fuzzy c-means approach for incomplete data clustering based on nearest-neighbor intervals. Soft Comput 17(10):1787–1796

    Article  Google Scholar 

  35. Li D, Deogun J, Spaulding W, Shuart B (2004) Towards Missing Data Imputation: A Study of Fuzzy K-means Clustering Method. Rough Sets and Current Trends in Computing: 4th International Conference, RSCTC 2004, Uppsala, Sweden, June 1–5, 2004. Proceedings. Springer Berlin Heidelberg, Berlin, Heidelberg. doi:10.1007/978-3-540-25929-9_70

    Chapter  Google Scholar 

  36. Mac ParthaláIn N, Jensen R (2013) Unsupervised fuzzy-rough set-based dimensionality reduction. Inf Sci 229:106–121

    Article  MathSciNet  Google Scholar 

  37. Qian Y, Liang J, Pedrycz W, Dang C (2011) An efficient accelerator for attribute reduction from incomplete data in rough set framework. Pattern Recogn 44:1658–1670

    Article  Google Scholar 

  38. Liu Z-G, Pan Q, Dezert J (2013) A new belief-based K-nearest neighbor classification method. Pattern Recogn 46:834–844

    Article  Google Scholar 

  39. Liu Z-G, Pan Q, Dezert J, Mercier G (2014) Credal classification rule for uncertain data based on belief functions. Pattern Recogn 47:2532–2541

    Article  Google Scholar 

  40. Z-g L, Pan Q, Dezert J, Mercier G (2015) Credal c-means clustering method based on belief functions. Knowl-Based Syst 74:119–132

    Article  Google Scholar 

  41. Chen H, Du Y, Jiang K (2012) Classification of incomplete data using classifier ensembles. 2012 International Conference on Systems and Informatics (ICSAI2012). doi:10.1109/ICSAI.2012.6223495

  42. Sefidian AM, Daneshpour N (2019) Missing value imputation using a novel grey based fuzzy c-means, mutual information based feature selection, and regression model. Expert Syst Appl 115:68–94

    Article  Google Scholar 

  43. Amiri M, Jensen R (2016) Missing data imputation using fuzzy-rough methods. Neurocomputing 205:152–164

    Article  Google Scholar 

  44. Gautam C, Ravi V (2015) Data imputation via evolutionary computation, clustering and a neural network. Neurocomputing 156:134–142. https://doi.org/10.1016/j.neucom.2014.12.073

    Article  Google Scholar 

  45. David JM, Balakrishnan K (2014) Learning disability prediction tool using ANN and ANFIS. Soft Comput 18(6):1093–1112

    Article  Google Scholar 

  46. Silva-Ramírez E-L, Pino-Mejías R, López-Coello M (2015) Single imputation with multilayer perceptron and multiple imputation combining multilayer perceptron and k-nearest neighbours for monotone patterns. Appl Soft Comput 29:65–74

    Article  Google Scholar 

  47. Singh N, Javeed A, Chhabra S, Kumar P (2015) Missing value imputation with unsupervised kohonen self organizing map. In: Emerging Research in Computing, Information, Communication and Applications. Springer, pp 61–76

  48. Moghaddam VH, Hamidzadeh J (2016) New Hermite orthogonal polynomial kernel and combined kernels in Support Vector Machine classifier. Pattern Recogn 60:921–935

    Article  Google Scholar 

  49. Bishop CM (2006) Pattern recognition and machine learning. Springer, Berlin

    MATH  Google Scholar 

  50. Z-g L, Pan Q, Dezert J, Martin A (2016) Adaptive imputation of missing values for incomplete pattern classification. Pattern Recogn 52:85–95

    Article  Google Scholar 

  51. Z-g L, Liu Y, Dezert J, Pan Q (2015) Classification of incomplete data based on belief functions and K-nearest neighbors. Knowl-Based Syst 89:113–125. https://doi.org/10.1016/j.knosys.2015.06.022

    Article  Google Scholar 

  52. Merz CJ (1998) UCI repository of machine learning databases. http://wwwicsuciedu/~mlearn/MLRepository.html

  53. Sheskin DJ (2011) Handbook of parametric and nonparametric statistical procedures. Fifth Edition, Chapman and Hall/CRC

  54. Hu Y, Yang Y, Wang C, Tian M (2017) Imputation in nonparametric quantile regression with complex data. Statistics & Probability Letters 127:120–130

    Article  MathSciNet  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Javad Hamidzadeh.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Hamidzadeh, J., Moradi, M. Enhancing data analysis: uncertainty-resistance method for handling incomplete data. Appl Intell 50, 74–86 (2020). https://doi.org/10.1007/s10489-019-01514-4

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10489-019-01514-4

Keywords

Navigation