Abstract
Co-training method is a branch of semi-supervised learning, which improves the performance of classifier through the complementary effect of two views. In co-training algorithm, the selection of unlabeled data often adopts the high confidence degree strategy. Obviously, the higher confidence of data signifies the higher accuracy of prediction. Unfortunately, high confidence selection strategy is not always effective in improving classifier performance. In this paper, a co-training method based on entropy and multi-criteria is proposed. Firstly, the data set is divided into two views with the same amount of information by entropy. Then, the clustering criterion and confidence criterion are adopted to select unlabeled data in view 1 and view 2, respectively. It can solve the problem that high confidence criterion is not always valid. Different choices can better play the complementary role of co-training, thus supplement what the other view does not have. In addition, the role of labeled data is fully considered in multi-criteria in order to select more valuable unlabeled data. Experimental results on several UCI data sets and one artificial data set show the effectiveness of the proposed algorithm.
Similar content being viewed by others
References
Gong NZ, Frank M, Mittal P (2017) SybilBelief: a semi-supervised learning approach for structure-based Sybil detection. IEEE Trans Inf Forensics Secur 9(6):976–987
Ashfaq RAR, Wang XZ, Huang JZ, Haider A, Yu-Lin H (2017) Fuzziness based semi-supervised learning approach for intrusion detection system. Inf Scienc Int J 378(3):484–497
Tanha J, Someren MV, Afsarmanesh H (2017) Semi-supervised self-training for decision tree classifiers. Int J Mach Learn Cybern 8(1):355–370
Li J, Zhu Q (2019) Semi-supervised self-training method based on an optimum-path Forest. IEEE Access 7:36388–36399
Jiang B, Chen H, Yuan B, Xin Y (2017) Scalable graph-based semi-supervised learning through sparse Bayesian model. IEEE Trans Knowl Data Eng 29(12):2758–2771
Meyer SS, Rossiter H, Brookes MJ, Woolrichcet MW, Bestmannd S, Barnesa GR (2017) Using generative models to make probabilistic statements about hippocampal engagement in MEG. NeuroImage 149(2):468–482
Zhang X, Song Q, Liu R, Wang W, Jiao L (2017) Modified co-training with spectral and spatial views for Semisupervised Hyperspectral image classification. IEEE J Selected Top Appl Earth Observ Remote Sens 7(6):2044–2055
Appice A, Guccione P, Malerba D (2016) A novel spectral-spatial co-training algorithm for the transductive classification of hyperspectral imagery data. Pattern Recogn 63(10):229–245
Bin Y, Yang Y, Shen F, Xu X (2016) Combining multi-representation for multimedia event detection using co-training. Neurocomputing 217(23):11–18
Zheng Y, Capra L, Wolfson O, Yang H (2014) Urban computing: concepts. Methodol Appl Acm Trans Intell Syst Techno 5(3):1–55
Du J, Ling CX, Zhou ZH (2011) When does Cotraining work in real data. IEEE Trans Knowl Data Eng 23(5):788–799
Xu C, Tao D, Xu C (2015) Multi-view intact space learning. IEEE Trans Pattern Anal Mach Intell 37(12):1–1
Zhang ML, Zhou ZH (2011) COTRADE: confident co-training with data editing. IEEE Trans Syst Man, and Cybern Part B (Cybern) 41(6):1612–1626
Angluin D, Laird PD (1988) Learning from Noisy examples. Mach Learn 2(4):343–370
Gan H, Sang N, Huang R, Dan Z (2013) Using clustering analysis to improve semi-supervised classification. Neurocomputing 25(3):290–298
GONG YL, LU J (2019) Co-training method combined semi-supervised clustering and weighted K Nearest Neighbor. Comput Eng Appl 55(22):114–118l
GONG YL, LU J (2019) Co-training method combined active learning and density peaks clustering. Comput Appl 39(08):2297–2301
Rodriguez A, Laio A (2014) Clustering by fast search and find of density peaks. Science 344(6191):1492–1496
Sun S (2013) A survey of multi-view machine learning. Neural Comput & Applic 23(7–8):2031–2038
Zhang Y, Wen J, Wang X, Jiang Z (2014) Semi-supervised learning combining co-training with active learning. Expert Syst Appl 41(5):2372–2378
Liu ZY, Gao ZB, Li XL (2018) Co-training method based on margin data addition. Chin J Sci Instrum 39(03):45–53
Goldman S, Zhou Y (2000) Enhancing supervised learning with unlabeled data, Pcoceedings of the 17th International Conference on Machine Learning San Francisco 327–334
Hady MFA, Schwenker F (2008) Co-training by committee: a new semi-supervised learning framework. IEEE Int Conf Data Min Workshops IEEE:563–572
Blum A, Mitchell T (1998) Combining Labeled and Unlabeled Data with Co-Training, Proceedings of the 11th Annual Conference on Computational Learning Theory
Shannon CE (1948) A mathematical theory of communication[J]. Bell Syst Tech J 27(4):379–423
Fard MM, Thonet T, Gaussier E (2018) Deep k-means: jointly clustering with k-means and learning representations. Pattern Recogn Lett 138(10):185–192
Khanmohammadi S, Adibeig N, Shanehbandy S (2017) An improved overlapping k-means clustering method for medical applications. Expert Syst Appl 67(1):12–18
Zhu Q, Pei J, Liu XB, Zhou ZP (2019) Analyzing commercial aircraft fuel consumption during descent: a case study using an improved K-means clustering algorithm. J Clean Prod 223(12):869–882
Liu G, Yang J, Hao Y, Zhang Y (2018) Big data-informed energy efficiency assessment of China industry sectors based on K-means clustering. J Clean Prod 183(9):304–314
Abellán J, Castellano JG (2017) Improving the naive Bayes classifier via a quick variable selection method using maximum of entropy. Entropy 19(6):247–264
Wang S, Wu L, Jiao L, Liu H (2014) Improve the performance of co-training by committee with refinement of class probability estimations. Neurocomputing 136(8):30–40
Dong LY, Sui P, Sun P, Li YL (2016) A new naive bayes classification algorithm based on semi-supervised learning. J Jilin Univ (Eng Edition) 46(3):884–889
Feng X, Li S, Yuan C, Zeng P, Sun Y (2018) Prediction of slope stability using naive Bayes classifier. KSCE J Civ Eng 22(3):941–950
Nicholson T, Sambridge M, Gudmundsson Ó (2010) On entropy and clustering in earthquake hypocentre distributions. Geophys J R Astron Soc 142(1):37–51
Wang Y, Chen S, Zhou ZH (2012) New semi-supervised classification method based on modified cluster assumption. IEEE Trans Neural Netw Learn Syst 23(5):689–702
Piroonsup N, Sinthupinyo S (2018) Analysis of training data using clustering to improve semi-supervised self-training. Knowl-Based Syst 143(2):65–80
Funding
This work is supported by Chongqing University Innovation Research Group Founding.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Lu, J., Gong, Y. A co-training method based on entropy and multi-criteria. Appl Intell 51, 3212–3225 (2021). https://doi.org/10.1007/s10489-020-02014-6
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10489-020-02014-6