当前位置: X-MOL 学术IEEE Trans. Neural Netw. Learn. Syst. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Efficient Active Learning by Querying Discriminative and Representative Samples and Fully Exploiting Unlabeled Data
IEEE Transactions on Neural Networks and Learning Systems ( IF 10.2 ) Pub Date : 2020-08-26 , DOI: 10.1109/tnnls.2020.3016928
Bin Gu , Zhou Zhai , Cheng Deng , Heng Huang

Active learning is an important learning paradigm in machine learning and data mining, which aims to train effective classifiers with as few labeled samples as possible. Querying discriminative (informative) and representative samples are the state-of-the-art approach for active learning. Fully utilizing a large amount of unlabeled data provides a second chance to improve the performance of active learning. Although there have been several active learning methods proposed by combining with semisupervised learning, fast active learning with fully exploiting unlabeled data and querying discriminative and representative samples is still an open question. To overcome this challenging issue, in this article, we propose a new efficient batch mode active learning algorithm. Specifically, we first provide an active learning risk bound by fully considering the unlabeled samples in characterizing the informativeness and representativeness. Based on the risk bound, we derive a new objective function for batch mode active learning. After that, we propose a wrapper algorithm to solve the objective function, which essentially trains a semisupervised classifier and selects discriminative and representative samples alternately. Especially, to avoid retraining the semisupervised classifier from scratch after each query, we design two unique procedures based on the path-following technique, which can remove multiple queried samples from the unlabeled data set and add the queried samples into the labeled data set efficiently. Extensive experimental results on a variety of benchmark data sets not only show that our algorithm has a better generalization performance than the state-of-the-art active learning approaches but also show its significant efficiency.

中文翻译:

通过查询有判别性和代表性的样本并充分利用未标记数据进行有效的主动学习

主动学习是机器学习和数据挖掘中的重要学习范式,其目的是用尽可能少的标记样本训练有效的分类器。查询判别性(信息性)和代表性样本是主动学习的最新方法。充分利用大量未标记数据为提高主动学习的性能提供了第二次机会。尽管已经提出了几种结合半监督学习的主动学习方法,但充分利用未标记数据并查询有判别性和代表性样本的快速主动学习仍然是一个悬而未决的问题。为了克服这个具有挑战性的问题,在本文中,我们提出了一种新的高效批处理模式主动学习算法。具体来说,我们首先通过在表征信息量和代表性时充分考虑未标记的样本来提供主动学习风险。基于风险界限,我们为批量模式主动学习推导出一个新的目标函数。之后,我们提出了一个包装算法来解决目标函数,该算法本质上是训练一个半监督分类器,并交替选择有判别性和代表性的样本。特别是,为了避免在每次查询后从头开始重新训练半监督分类器,我们基于路径跟踪技术设计了两个独特的程序,可以从未标记数据集中删除多个查询样本,并将查询样本有效地添加到标记数据集中。
更新日期:2020-08-26
down
wechat
bug