当前位置: X-MOL 学术Knowl. Based Syst. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Active learning through label error statistical methods
Knowledge-Based Systems ( IF 8.8 ) Pub Date : 2019-10-24 , DOI: 10.1016/j.knosys.2019.105140
Min Wang , Ke Fu , Fan Min , Xiuyi Jia

Clustering-based active learning splits data into a number of blocks and queries the labels of the most critical instances. An active learner must decide how to choose these critical instances and how to split the blocks. In this paper, we present theoretical and practical statistical methods for analyzing the relationship between the label error and the neighbor radius, and design new split and selection strategies to handle these two issues. First, we define statistical functions for the label error based on a single instance and instance pairs. Second, we build practical statistical models, calculate empirical label errors, and guide the block splitting process. Third, using these practical models, we develop a center-and-edge instance selection strategy for choosing critical instances. Fourth, we design a new algorithm called active learning through label error statistical methods (ALSE). Learning experiments were performed with 20 datasets from various domains. The results of significance tests verify the effectiveness of ALSE and its superiority over state-of-the-art active learning algorithms.



中文翻译:

通过标签错误统计方法主动学习

基于集群的主动学习将数据分为多个块,并查询最关键实例的标签。积极的学习者必须决定如何选择这些关键实例以及如何拆分模块。在本文中,我们提出了理论和实用的统计方法来分析标签错误和邻居半径之间的关系,并设计新的拆分和选择策略来处理这两个问题。首先,我们基于单个实例和实例对定义标签错误的统计函数。其次,我们建立实用的统计模型,计算经验标签错误,并指导区块分割过程。第三,使用这些实用模型,我们开发了用于选择关键实例的中心和边缘实例选择策略。第四,我们设计了一种通过标签错误统计方法(ALSE)进行主动学习的新算法。使用来自各个领域的20个数据集进行了学习实验。重要性测试的结果证明了ALSE的有效性及其相对于最新主动学习算法的优越性。

更新日期:2020-01-16
down
wechat
bug