Skip to main content
Log in

A parameter-free hybrid instance selection algorithm based on local sets with natural neighbors

  • Published:
Applied Intelligence Aims and scope Submit manuscript

Abstract

Instance selection aims to search for the best patterns in the training set and main instance selection methods include condensation methods, edition methods and hybrid methods. Hybrid methods combine advantages of both edition methods and condensation methods. Nevertheless, most of existing hybrid approaches heavily rely on parameters and are relatively time-consuming, resulting in the performance instability and application difficulty. Though several relatively fast and (or) parameter-free hybrid methods are proposed, they still have the difficulty in achieving both high accuracy and high reduction. In order to solve these problems, we present a new parameter-free hybrid instance selection algorithm based on local sets with natural neighbors (LSNaNIS). A new parameter-free definition for the local set is first proposed based on the fast search for natural neighbors. The new local set can fast and reasonably describe local characteristics of data. In LSNaNIS, we use the new local set to design an edition method (LSEdit) to remove harmful samples, a border method (LSBorder) to retain representative border samples and a core method (LSCore) to condense internal samples. Comparison experiments show that LSNaNIS is relatively fast and outperforms existing hybrid methods in improving the k-nearest neighbor in terms of both accuracy and reduction.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8

Similar content being viewed by others

Notes

  1. http://archive.ics.uci.edu/ml/datasets.html

References

  1. Chih-Fong T, Wei-Chao L, Hu Y-H, Guan-Ting Y (2019) Under-sampling class imbalanced datasets by combining clustering analysis and instance selection. Inf Sci 477:47–54

    Article  Google Scholar 

  2. Pang X, Xu C, Xu Y (2018) Scaling KNN multi-class twin support vector machine via safe instance reduction. Knowl-Based Syst 148(15):17–30

    Article  Google Scholar 

  3. Cano JR, Aljohani NR, Abbasi RA, Alowidbi JS, García S (2017) Prototype selection to improve monotonic nearest neighbor. Eng Appl Artif Intell 60:128–135

    Article  Google Scholar 

  4. Schmidt K, Behrens T, Scholten T (2008) Instance selection and classification tree analysis for large spatial datasets in digital soil mapping. Geoderma 146(1–2):0–146

    Google Scholar 

  5. Aytuğ O (2015) A fuzzy-rough nearest neighbor classifier combined with consistency-based subset evaluation and instance selection for automated diagnosis of breast cancer. Expert Syst Appl 42(20):6844–6852

    Article  Google Scholar 

  6. Hosseini S, Turhan B, Mäntylä M (2017) A benchmark study on the effectiveness of search-based data selection and feature selection for cross project defect prediction. Inf Softw Technol 95:296–312

    Article  Google Scholar 

  7. Chen ZY, Lin WC, Ke SW, Tsai CF (2015) Evolutionary feature and instance selection for traffic sign recognition. Comput Ind 74:201–211

    Article  Google Scholar 

  8. Kim Y, Enke D (2017) Instance selection using genetic algorithms for an intelligent Ensemble Trading System. Procedia Comput Sci 114:465–472

  9. Hart P (1968) The condensed nearest neighbor rule. IEEE Trans Inf Theory 14(3):515–516

    Article  Google Scholar 

  10. Wilson DL (1972) Asymptotic properties of nearest neighbor rules using edited data. IEEE Trans Syst Man Cybern 2(3):408–421

    Article  MathSciNet  MATH  Google Scholar 

  11. Chou CH, Kou BH, Fu C (2006) The generalized condensed nearest neighbor rule as a data reduction method. In: Proceedings of the 18th international conference on pattern recognition. IEEE Computer Society, pp 556-559

  12. Dasarathy BV, Sanchez JS, Townsend S (2000) Nearest neighbour editing and condensing tools-synergy exploitation. Pattern Anal Applic 3(1):19–30

    Article  Google Scholar 

  13. Ferri FJ, Albert JV, Vidal E (1999) Consideration about sample-size sensitivity of a family of edited nearest-neighbor rules. IEEE Trans Syst Man Cybern 29(4):667–672

    Article  Google Scholar 

  14. Sánchez J, Barandela R, Marques A, Alejo R, Badenas J (2003) Analysis of new techniques to obtain quality training sets. Pattern Recogn Lett 24(7):1015–1022

    Article  Google Scholar 

  15. Nikolaidis K, Goulermas JY, Wu QH (2011) A class boundary preserving algorithm for data condensation. Pattern Recogn 44(3):704–715

    Article  MATH  Google Scholar 

  16. Nikolaidis K, Eduardo RM, John YG (2012) Spectral graph optimization for instance reduction. IEEE Trans Neural Netw Learn Syst 23(7):1169–1175

    Article  Google Scholar 

  17. Cavalcanti GDC, Ren TI, Pereira CL (2013) ATISA: adaptive threshold-based instance selection algorithm. Expert Syst Appl 40(17):6894–6900

    Article  Google Scholar 

  18. Vallejo CG, Troyano JA, Ortega FJ (2010) InstanceRank: bringing order to datasets. Pattern Recogn Lett 31(2):131–142

    Article  Google Scholar 

  19. Hernandezleal P, Carrascoochoa JA, MartínezTrinidad JF, Olveralopez JA (2013) Instancerank based on borders for instance selection. Pattern Recogn 46(1):365–375

    Article  Google Scholar 

  20. Hamidzadeh J, Monsefi R, Yazdi HS (2015) Irahc: instance reduction algorithm using hyperrectangle clustering. Pattern Recogn 48(5):1878–1889

    Article  MATH  Google Scholar 

  21. Leyva E, Antonio G, Raúl P (2015) Three new instance selection methods based on local sets: a comparative study with several approaches from a bi-objective perspective. Pattern Recogn 48(4):1523–1537

    Article  Google Scholar 

  22. Li J, Wang Y (2015) A new fast reduction technique based on binary nearest neighbor tree. Neurocomputing 149(3):1647–1657

    Article  Google Scholar 

  23. Yang L, Zhu Q, Huang J, Cheng D (2017) Adaptive edited natural neighbor algorithm. Neurocomputing 230(22):427–433

    Article  Google Scholar 

  24. Yang L, Zhu Q, Huang J, Cheng D, Wu Q, Hong X (2018) Natural neighborhood graph-based instance reduction algorithm without parameters. Appl Soft Comput 70:279–287

    Article  Google Scholar 

  25. Yang L, Zhu Q, Huang J, Cheng D, Wu Q, Hong X (2019) Constraint nearest neighbor for instance reduction. Soft Comput 23:13235–13245

  26. Zhu Q, Feng J, Huang J (2016) Natural neighbor: a self-adaptive neighborhood method without parameter k. Pattern Recogn Lett 80(1):30–36

    Article  Google Scholar 

  27. Cover TM, Hart PE (1967) Nearest neighbor pattern classification. IEEE Trans Inform Theory 13(1):21–27

    Article  MATH  Google Scholar 

  28. Brighton H, Mellish C (2002) Advances in instance selection for instance-based learning algorithms. Data Min Knowl Disc 6(2):153–172

    Article  MathSciNet  MATH  Google Scholar 

  29. Fayed HA, Atiya AF (2009) A novel template reduction approach for the K-nearest neighbor method. IEEE Trans Neural Netw 20(5):890–896

    Article  Google Scholar 

  30. Marchiori E (2008) Hit miss networks with applications to instance selection. J Mach Learn Res 9:997–1017

    MathSciNet  MATH  Google Scholar 

  31. Marchiori E (2009) Graph-based Discrete Differential Geometry for Critical Instance Filtering. European Conference on Machine Learning & Knowledge Discovery in Databases, pp 63–78

  32. Marchiori E (2010) Class conditional nearest neighbor for large margin instance selection. IEEE Trans Pattern Anal Mach Intell 32(2):364–370

    Article  Google Scholar 

  33. Rico-Juan JR, Iñesta JM (2012) New rank methods for reducing the size of the training set using the nearest neighbor rule. Pattern Recogn Lett 33(5):654–660

    Article  Google Scholar 

  34. Cheng D, Zhu Q, Huang J, Yang L, Wu Q (2017) Natural neighbor-based clustering algorithm with local representatives. Knowl-Based Syst 123(1):238–253

    Article  Google Scholar 

  35. Cheng D, Zhu Q, Huang J, Wu Q, Yang L (2018) A local cores-based hierarchical clustering algorithm for data sets with complex structures. Neural Computing & Applications, pp 1-18

  36. Huang J, Zhu Q, Yang L, Feng J (2016) A non-parameter outlier detection algorithm based on natural neighbor. Knowl-Based Syst 92(15):71–77

    Article  Google Scholar 

  37. Li J, Zhu Q, Wu Q (2019) A self-training method based on density peaks and an extended parameter-free local noise filter for k nearest neighbor. Knowl-Based Syst

  38. Caises Y, González A, Leyva E, Pérez R (2011) Combining instance selection methods based on data characterization: an approach to increase their effectiveness. Inf Sci 181(20):4780–4798

    Article  Google Scholar 

  39. Álvar A-G, Díez-Pastor J, Rodríguez JJ, García-Osorio C (2018) Local sets for multi-label instance selection. Appl Soft Comput 68:651–666

    Article  Google Scholar 

  40. Xie J, Zhong-Yang X, Yu-Fang Z, Yong F, Ma J (2018) Density core-based clustering algorithm with dynamic scanning radius. Knowl-Based Syst 142(15):58–70

    Article  Google Scholar 

  41. Bentley JL (1975) Multidimensional binary search trees used for associative searching. Commun ACM 18(9):509–517

    Article  MathSciNet  MATH  Google Scholar 

  42. Wei W, Liang J, Guo X, Peng S, Yijun S (2019) Hierarchical division clustering framework for categorical data. Neurocomputing 341(14):118–134

    Article  Google Scholar 

  43. Wang G, Yiheng W, Peter T (2018) Clustering by defining and merging candidates of cluster centers via independence and affinity. Neurocomputing 315(13):486–495

    Article  Google Scholar 

  44. Cheng Y, Dawei Z, Wenfa Z, Wang Y (2018) Multi-label learning of non-equilibrium labels completion with mean shift. Neurocomputing 321(10):92–102

    Google Scholar 

  45. Li J, Zhu Q (2019) Semi-supervised self-training method based on an optimum-path forest. IEEE Access 7:36388–36399

    Article  Google Scholar 

Download references

Acknowledgements

This work was supported by the National Natural Science Foundation of China (61802360) and Chongqing science and technology project (KJZH17104 and CSTC2017rgun-zdyfx0040).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Qingsheng Zhu.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Li, J., Zhu, Q. & Wu, Q. A parameter-free hybrid instance selection algorithm based on local sets with natural neighbors. Appl Intell 50, 1527–1541 (2020). https://doi.org/10.1007/s10489-019-01598-y

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10489-019-01598-y

Keywords

Navigation