当前位置: X-MOL 学术Knowl. Based Syst. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
An effective framework based on local cores for self-labeled semi-supervised classification
Knowledge-Based Systems ( IF 8.8 ) Pub Date : 2020-03-31 , DOI: 10.1016/j.knosys.2020.105804
Junnan Li , Qingsheng Zhu , Quanwang Wu , Dongdong Cheng

Semi-supervised self-labeled methods apply unlabeled data to improve the performance of classifiers which are trained by labeled data alone. Nevertheless, applying unlabeled data may deteriorate the prediction accuracy. One of the causes is that there are insufficient labeled data for training an initial classifier in self-labeled methods. However, existing solutions for this problem of lacking sufficient initial labeled data still have technical defects. For example, they fail to deal with non-spherical data and improve insufficient initial labeled data effectively, when initial labeled data are extremely scarce. In this paper, we propose an effective semi-supervised self-labeled framework based on local cores, aiming to solve the problem of lacking adequate initial labeled data in self-labeled methods and overcome existing technical defects above. Main ideas of our framework include two sides: (a) inadequate initial labeled data are improved by adding predicted local cores to them, where local cores are predicted by active labeling or co-labeling; (b) we use any semi-supervised self-labeled method to train a given classifier on improved labeled data and updated unlabeled data. In our framework, local cores roughly reveal the data distribution, which helps the proposed framework work on spherical or non-spherical data sets. In addition, local cores also help our framework improve insufficient initial labeled data effectively, even when initial labeled data are extremely scarce. Experiments show that the proposed framework is compatible with tested self-labeled methods, and can help self-labeled methods train a k nearest neighbor or support vector machine, when initial labeled data are insufficient.



中文翻译:

一个基于局部核心的有效框架,用于自我标记的半监督分类

半监督自标记方法应用未标记的数据来提高仅通过标记数据训练的分类器的性能。但是,应用未标记的数据可能会降低预测精度。原因之一是没有足够的标记数据来训练自标记方法中的初始分类器。但是,针对缺少足够的初始标记数据的问题的现有解决方案仍然存在技术缺陷。例如,当初始标记数据非常稀缺时,它们无法处理非球形数据并不能有效地改善初始标记数据不足的情况。在本文中,我们提出了一个基于局部核心的有效的半监督自标记框架,旨在解决自标记方法中缺乏足够的初始标记数据的问题,并克服上述现有技术缺陷。我们框架的主要思想包括两个方面:(a)通过向其添加预测的局部核心来改善初始标记数据不足,其中通过主动标记或共标记来预测局部核心;(b)我们使用任何半监督式自标记方法在改进的标记数据和更新的未标记数据上训练给定分类器。在我们的框架中,局部核心大致揭示了数据分布,这有助于所提出的框架在球形或非球形数据集上工作。此外,本地核心还可以帮助我们的框架有效地改善不足的初始标记数据,即使初始标记数据极为匮乏。当初始标记的数据不足时,请使用k最近邻居或支持向量机。

更新日期:2020-03-31
down
wechat
bug