当前位置: X-MOL 学术Inf. Process. Manag. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
A Selection Metric for semi-supervised learning based on neighborhood construction
Information Processing & Management ( IF 8.6 ) Pub Date : 2020-12-22 , DOI: 10.1016/j.ipm.2020.102444
Mona Emadi , Jafar Tanha , Mohammad Ebrahim Shiri , Mehdi Hosseinzadeh Aghdam

The present paper focuses on semi-supervised classification problems. Semi-supervised learning is a learning task through both labeled and unlabeled samples. One of the main issues in semi-supervised learning is to use a proper selection metric for sampling from the unlabeled data in order to extract informative unlabeled data points. This is indeed vital for the semi-supervised self-training algorithms. Most self-training algorithms employ the probability estimations of the underlying base learners to select high-confidence predictions, which are not always useful for improving the decision boundary. In this study, a novel self-training algorithm is proposed based on a new selection metric using a neighborhood construction algorithm. We select unlabeled data points that are close to the decision boundary. Although these points are not high-confidence based on the probability estimation of the underlying base learner, they are more effective for finding an optimal decision boundary. To assign the correct labels to these data points, we propose an agreement between the classifier predictions and the neighborhood construction algorithm. The proposed approach uses a neighborhood construction algorithm employing peak data points and an Apollonius circle for sampling from unlabeled data. The algorithm then finds the agreement between the classifier predictions and the neighborhood construction algorithm to assign labels to unlabeled data at each iteration of the training process. The experimental results demonstrate that the proposed algorithm can effectively improve the performance of the constructed classification model.



中文翻译:

基于邻域构造的半监督学习选择指标

本文着重于半监督分类问题。半监督学习是通过标记和未标记样本进行的学习任务。半监督学习中的主要问题之一是使用适当的选择度量从未标记的数据中进行采样,以提取信息丰富的未标记数据点。对于半监督自训练算法而言,这确实至关重要。大多数自我训练算法都采用基础学习者的概率估计来选择高置信度预测,但这对改善决策边界并不总是有用的。在这项研究中,提出了一种新的自训练算法,该算法基于使用邻域构造算法的新选择度量。我们选择靠近决策边界的未标记数据点。尽管根据基础学习者的概率估计,这些要点并不是很高的置信度,但是它们对于找到最佳决策边界更为有效。为了给这些数据点分配正确的标签,我们提出了分类器预测和邻域构造算法之间的协议。所提出的方法使用邻域构造算法,该算法采用峰值数据点和Apollonius圆从未标记的数据进行采样。然后,该算法找到分类器预测与邻域构造算法之间的一致性,以在训练过程的每次迭代中将标签分配给未标记的数据。实验结果表明,该算法可以有效提高分类模型的性能。

更新日期:2020-12-22
down
wechat
bug