Skip to main content
Log in

Fast feature selection for interval-valued data through kernel density estimation entropy

  • Original Article
  • Published:
International Journal of Machine Learning and Cybernetics Aims and scope Submit manuscript

Abstract

Kernel density estimation, which is a non-parametric method about estimating probability density distribution of random variables, has been used in feature selection. However, existing feature selection methods based on kernel density estimation seldom consider interval-valued data. Actually, interval-valued data exist widely. In this paper, a feature selection method based on kernel density estimation for interval-valued data is proposed. Firstly, the kernel function in kernel density estimation is defined for interval-valued data. Secondly, the interval-valued kernel density estimation probability structure is constructed by the defined kernel function, including kernel density estimation conditional probability, kernel density estimation joint probability and kernel density estimation posterior probability. Thirdly, kernel density estimation entropies for interval-valued data are proposed by the constructed probability structure, including information entropy, conditional entropy and joint entropy of kernel density estimation. Fourthly, we propose a feature selection approach based on kernel density estimation entropy. Moreover, we improve the proposed feature selection algorithm and propose a fast feature selection algorithm based on kernel density estimation entropy. Finally, comparative experiments are conducted from three perspectives of computing time, intuitive identifiability and classification performance to show the feasibility and the effectiveness of the proposed method.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Similar content being viewed by others

References

  1. Javidi MM, Eskandari S (2018) Streamwise feature selection: a rough set method. Int J Mach Learn Cybernet 9(4):667–676

    Google Scholar 

  2. Li JZ, Yang XB, Song XN, Wang PX, Yu DJ (2019) Neighborhood attribute reduction: a multi-criterion approach. Int J Mach Learn Cybernet 10(4):731–742

    Google Scholar 

  3. Dai JH, Hu QH, Hu H, Huang DB (2018) Neighbor inconsistent pair selection for attribute reduction by rough set approach. IEEE Trans Fuzzy Syst 26(2):937–950

    Google Scholar 

  4. Shang RH, Chang JW, Jiao LC, Xue Y (2019) Unsupervised feature selection based on self-representation sparse regression and local similarity preserving. Int J Mach Learn Cybernet 10(4):757–770

    Google Scholar 

  5. Dai JH, Hu QH, Zhang JH, Hu H, Zheng NG (2017) Attribute selection for partially labeled categorical data by rough set approach. IEEE Trans Cybernet 47(9):2460–2471

    Google Scholar 

  6. Dai JH (2013) Rough set approach to incomplete numerical data. Inf Sci 240:43–57

    MathSciNet  MATH  Google Scholar 

  7. Wang CZ, Qi YL, Shao MW, Hu QH, Chen DG, Qian YH, Lin YJ (2017) A fitting model for feature selection with fuzzy rough sets. IEEE Trans Fuzzy Syst 25(4):741–753

    Google Scholar 

  8. Dai JH, Hu H, Wu WZ, Qian YH, Huang DB (2018) Maximal-discernibility-pair-based approach to attribute reduction in fuzzy rough sets. IEEE Trans Fuzzy Syst 26(4):2174–2187

    Google Scholar 

  9. Zhang X, Mei CL, Chen DG, Li JH (2016) Feature selection in mixed data: A method using a novel fuzzy rough set-based information entropy. Pattern Recogn 56:1–15

    MATH  Google Scholar 

  10. Dai JH, Xu Q (2013) Attribute selection based on information gain ratio in fuzzy rough set theory with application to tumor classification. Appl Soft Comput 13(1):211–221

    Google Scholar 

  11. Dai JH, Han HF, Hu QH, Liu MF (2016) Discrete particle swarm optimization approach for cost sensitive attribute reduction. Knowl-Based Syst 102:116–126

    Google Scholar 

  12. Ashour AS, Guo Y, Kucukkulahli E, Erdogmus P, Polat K (2018) A hybrid dermoscopy images segmentation approach based on neutrosophic clustering and histogram estimation. Appl Soft Comput 69:426–434

    Google Scholar 

  13. Parzen E (1962) On estimation of a probability density function and mode. Ann Math Stat 3(33):1065–1076

    MathSciNet  MATH  Google Scholar 

  14. Rosenblatt M (1956) Remarks on some nonparametric estimates of a density function. Ann Math Stat, pp 832–837

  15. Banerjee A, Burlina P (2010) Efficient particle filtering via sparse kernel density estimation. IEEE Trans Image Process 19(9):2480–2490

    MathSciNet  MATH  Google Scholar 

  16. Cai XJ, Wu ZF, Cheng J (2012) Using kernel density estimation to assess the spatial pattern of road density and its impact on landscape fragmentation. Int J Geogr Inf Sci 27:1–9

    Google Scholar 

  17. Qian PJ, Wang ST, Deng ZH (2011) Fast adaptive similarity-based clustering using sparse parzen window density estimation. Acta Autom Sin 37(2):179–187

    MathSciNet  MATH  Google Scholar 

  18. Rouhani M, Mohammadi M, Kargarian A (2016) Parzen window density estimator-based probabilistic power flow with correlated uncertainties. IEEE Trans Sustain Energy 7(3):1170–1181

    Google Scholar 

  19. Schller H, Hartmann U (1992) Mapping neural network derived from the parzen window estimator. Neural Netw 5(6):903–909

    Google Scholar 

  20. Wang S, Chung F, Xiong F (2008) A novel image thresholding method based on parzen window estimate. Pattern Recogn 41(1):117–129

    MATH  Google Scholar 

  21. Wang SC, Gao R, Wang LM (2016) Bayesian network classifiers based on gaussian kernel density. Expert Syst Appl 51:207–217

    Google Scholar 

  22. Yang SS, Zheng F, Luo X, Cai SX, Wu YF, Liu KZ, Wu MH, Chen J, Krishnan S (2014) Effective dysphonia detection using feature dimension reduction and kernel density estimation for patients with parkinsons disease. PLoS ONE 9(2):e88825

    Google Scholar 

  23. Yu WH, Ai TH, Shao SW (2015) The analysis and delimitation of central business district using network kernel density estimation. J Transp Geogr 45:32–47

    Google Scholar 

  24. Kwak N, Choi CH (2002) Input feature selection by mutual information based on parzen window. IEEE Trans Pattern Anal Mach Intell 24(12):1667–1671

    Google Scholar 

  25. Xu SQ, Dai JH, Shi H (2018) Semi-supervised feature selection by mutual information based on kernel density estimation. In: 24th international conference on pattern recognition (ICPR), pp 818–823

  26. Zhang JH (2017) Kernel density estimation entropy for mixed data and fast greedy feature selection algorithms. Master’s thesis, Zhejiang university

  27. Dai JH, Wang WT, Xu Q, Tian HW (2012) Uncertainty measurement for interval-valued decision systems based on extended conditional entropy. Knowl-Based Syst 27:443–450

    Google Scholar 

  28. Dai JH, Wang WT, Mi JS (2013) Uncertainty measurement for interval-valued information systems. Inf Sci 251:63–78

    MathSciNet  MATH  Google Scholar 

  29. Du WS, Hu BQ (2014) Approximate distribution reducts in inconsistent interval-valued ordered decision tables. Inf Sci 271:93–114

    MathSciNet  MATH  Google Scholar 

  30. Yang XB, Qi Yong YDJ, Yu HL, Yang JY (2015) \(\alpha\)-Dominance relation and rough sets in interval-valued information systems. Inf Sci 294:334–347

    MathSciNet  MATH  Google Scholar 

  31. Dai JH, Zheng GJ, Han HF, Hu QH, Zheng NG, Liu J, Zhang QL (2017) Probability approach for interval-valued ordered decision systems in dominance-based fuzzy rough set theory. J Intell Fuzzy Syst 32(1):701–703

    MATH  Google Scholar 

  32. Guru DS, Kumar NV, Suhil M (2017) Feature selection of interval valued data through interval K-means clustering. Int J Comput Vis Image Process 7:64–80

    Google Scholar 

  33. Li LF (2017) Multi-level interval-valued fuzzy concept lattices and their attribute reduction. Int J Mach Learn Cybernet 8(1):45–56

    Google Scholar 

  34. Dai JH, Hu H, Zheng GJ, Hu QH, Han HF, Shi H (2016) Attribute reduction in interval-valued information systems based on information entropies. Front Inf Technol Electron Eng 17(9):919–928

    Google Scholar 

  35. Dai JH, Yan YJ, Li ZW, Liao BS (2018) Dominance-based fuzzy rough set approach for incomplete interval-valued data. J Intell Fuzzy Syst 34:423–436

    Google Scholar 

  36. Guru DS, Kumar NV (2020) Interval chi-square score (ICSS): feature selection of interval valued data. Adv Intell Syst Comput 941:686–698

    Google Scholar 

  37. Gatenby RA, Frieden BR (2008) Inf Theory and Entropy. Springer, New York

    Google Scholar 

  38. Wang XZ, Xing HJ, Li Y, Hua Q, Dong CR, Pedrycz W (2015) A study on relationship between generalization abilities and fuzziness of base classifiers in ensemble learning. IEEE Trans Fuzzy Syst 23(5):1638–1654

    Google Scholar 

  39. Wang R, Wang XZ, Kwong S, Xu C (2017) Incorporating diversity and informativeness in multiple-instance active learning. IEEE Trans Fuzzy Syst 25(6):1460–1475

    Google Scholar 

  40. Wang XZ, Wang R, Xu C (2018) Discovering the relationship between generalization and uncertainty by incorporating complexity of classification. IEEE Trans Cybernet 48(2):703–715

    MathSciNet  Google Scholar 

  41. Zhang GL, Shen H, Shi F, Huo YQ (2015) Block iterative inversion algorithms for large real symmetric matrix. Wirel Interconnect Technol 6:127–129

    Google Scholar 

  42. Grcar J (2011) Mathematicians of Gaussian elimination. Not Am Math Soc 58(6):782–792

    MathSciNet  MATH  Google Scholar 

  43. Stanimirović PS, Petković MD (2013) Gauss-Jordan elimination method for computing outer inverses. Appl Math Comput 219(9):4667–4679

    MathSciNet  MATH  Google Scholar 

  44. Hedjazi L, Aguilar MJ, Lann MVL (2011) Similarity-margin based feature selection for symbolic interval data. Pattern Recogn Lett 32(4):578–585

    Google Scholar 

  45. Quevedo J, Puig V, Cembrano G, Blanch J, Aguilar J, Saporta D, Benito G, Hedo M, Molina A (2010) Validation and reconstruction of flow meter data in the barcelona water distribution network. Control Eng Pract 18(6):640–651

    Google Scholar 

  46. Khan J, Wei JS, Ringnér M, Lao HS, Ladanyi M, Westermann F, Berthold F, Schwab M, Antonescu CR, Peterson C, (2001) Classification and diagnostic prediction of cancers using gene expression profiling and artificial neural networks. Nat Med 7(6):673–679

    Google Scholar 

  47. Li JD, Cheng KW, Wang SH, Morstatter F, Trevino RP, Tang JL, Liu H (2018) Feature selection: a data perspective. ACM Comput Surv 9(4):1–45

    Google Scholar 

  48. Dua D, Graff C (2017) UCI machine learning repository. http://archive.ics.uci.edu/ml

  49. Zhang YY, Li TR, Luo C, Zhang JB, Chen HM (2016) Incremental updating of rough approximations in interval-valued information systems under attribute generalization. Inf Sci 373:461–475

    MATH  Google Scholar 

  50. Dai JH, Wei BJ, Zhang XH, Zhang QL (2017) Uncertainty measurement for incomplete interval-valued information systems based on \(\alpha\)-weak similarity. Knowl-Based Syst 136:159–171

    Google Scholar 

  51. He DC, Zhang HJ, Hao WN, Zhang R (2015) A robust parzen window mutual information estimator for feature selection with label noise. Intell Data Anal 19:1199–1212

    Google Scholar 

Download references

Acknowledgements

This work was supported by the National Natural Science Foundation of China (No. 61976089, No. 61473259, No. 61070074, No. 60703038), and the Hunan Provincial Science & Technology Project Foundation (2018TP1018, 2018RS3065).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jianhua Dai.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Dai, J., Liu, Y., Chen, J. et al. Fast feature selection for interval-valued data through kernel density estimation entropy. Int. J. Mach. Learn. & Cyber. 11, 2607–2624 (2020). https://doi.org/10.1007/s13042-020-01131-5

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s13042-020-01131-5

Keywords

Navigation