当前位置: X-MOL 学术Mach. Learn. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
A survey on semi-supervised learning
Machine Learning ( IF 4.3 ) Pub Date : 2019-11-15 , DOI: 10.1007/s10994-019-05855-6
Jesper E. van Engelen , Holger H. Hoos

Semi-supervised learning is the branch of machine learning concerned with using labelled as well as unlabelled data to perform certain learning tasks. Conceptually situated between supervised and unsupervised learning, it permits harnessing the large amounts of unlabelled data available in many use cases in combination with typically smaller sets of labelled data. In recent years, research in this area has followed the general trends observed in machine learning, with much attention directed at neural network-based models and generative learning. The literature on the topic has also expanded in volume and scope, now encompassing a broad spectrum of theory, algorithms and applications. However, no recent surveys exist to collect and organize this knowledge, impeding the ability of researchers and engineers alike to utilize it. Filling this void, we present an up-to-date overview of semi-supervised learning methods, covering earlier work as well as more recent advances. We focus primarily on semi-supervised classification, where the large majority of semi-supervised learning research takes place. Our survey aims to provide researchers and practitioners new to the field as well as more advanced readers with a solid understanding of the main approaches and algorithms developed over the past two decades, with an emphasis on the most prominent and currently relevant work. Furthermore, we propose a new taxonomy of semi-supervised classification algorithms, which sheds light on the different conceptual and methodological approaches for incorporating unlabelled data into the training process. Lastly, we show how the fundamental assumptions underlying most semi-supervised learning algorithms are closely connected to each other, and how they relate to the well-known semi-supervised clustering assumption.

中文翻译:

半监督学习综述

半监督学习是机器学习的一个分支,涉及使用标记和未标记的数据来执行某些学习任务。从概念上讲,它介于有监督学习和无监督学习之间,它允许将许多用例中可用的大量未标记数据与通常较小的标记数据集结合使用。近年来,该领域的研究遵循机器学习中观察到的总体趋势,重点关注基于神经网络的模型和生成学习。关于该主题的文献在数量和范围上也有所扩展,现在涵盖了广泛的理论、算法和应用。然而,最近没有调查来收集和组织这些知识,阻碍了研究人员和工程师等人利用它的能力。填补这个空白,我们提供了半监督学习方法的最新概述,涵盖了早期的工作以及最近的进展。我们主要关注半监督分类,其中大部分半监督学习研究发生。我们的调查旨在为该领域的新研究人员和从业者以及更高级的读者提供对过去二十年开发的主要方法和算法的深刻理解,重点是最突出和当前相关的工作。此外,我们提出了一种新的半监督分类算法分类法,它阐明了将未标记数据纳入训练过程的不同概念和方法论方法。最后,
更新日期:2019-11-15
down
wechat
bug