当前位置: X-MOL 学术Test › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
On semi-supervised learning
TEST ( IF 1.2 ) Pub Date : 2019-11-16 , DOI: 10.1007/s11749-019-00690-2
A. Cholaquidis , R. Fraiman , M. Sued

Major efforts have been made, mostly in the machine learning literature, to construct good predictors combining unlabelled and labelled data. These methods are known as semi-supervised. They deal with the problem of how to take advantage, if possible, of a huge amount of unlabelled data to perform classification in situations where there are few labelled data. This is not always feasible: it depends on the possibility to infer the labels from the unlabelled data distribution. Nevertheless, several algorithms have been proposed recently. In this work, we present a new method that, under almost necessary conditions, attains asymptotically the performance of the best theoretical rule when the size of the unlabelled sample goes to infinity, even if the size of the labelled sample remains fixed. Its performance and computational time are assessed through simulations and in the well- known “Isolet” real data of phonemes, where a strong dependence on the choice of the initial training sample is shown. The main focus of this work is to elucidate when and why semi-supervised learning works in the asymptotic regime described above. The set of necessary assumptions, although reasonable, show that semi-parametric methods only attain consistency for very well-conditioned problems.



中文翻译:

关于半监督学习

已经做出了很大的努力,主要是在机器学习文献中,构建了结合未标记和标记数据的良好预测变量。这些方法称为半监督。他们处理的问题是,如果可能的话,如何在标签数据很少的情况下利用大量未标签数据进行分类。这并不总是可行的:这取决于从未标记的数据分布中推断出标签的可能性。然而,最近已经提出了几种算法。在这项工作中,我们提出了一种新方法,即使未标记样品的尺寸保持固定,在几乎必要的条件下,当未标记样品的尺寸达到无穷大时,它会渐近地达到最佳理论规则的性能。通过模拟和众所周知的音素“ Isolet”真实数据来评估其性能和计算时间,其中显示了对初始训练样本选择的强烈依赖。这项工作的主要重点是阐明半监督学习何时以及为什么在上述渐近状态下起作用。一组必要的假设虽然合理,但表明半参数方法仅在条件非常好的问题上才具有一致性。

更新日期:2019-11-16
down
wechat
bug