当前位置: X-MOL 学术J. Exp. Theor. Artif. Intell. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Zero Initialised Unsupervised Active Learning by Optimally Balanced Entropy-Based Sampling for Imbalanced Problems
Journal of Experimental & Theoretical Artificial Intelligence ( IF 2.2 ) Pub Date : 2021-05-24 , DOI: 10.1080/0952813x.2021.1924871
Gábor Szűcs 1 , Dávid Papp 1
Affiliation  

ABSTRACT

Given the challenge of gathering labelled training data for machine learning tasks, active learning has become popular. This paper focuses on the beginning of unsupervised active learning, where there are no labelled data at all. The aim of this zero initialised unsupervised active learning is to select the most informative examples – even from an imbalanced dataset – to be labelled manually. Our solution with proposed selection strategy, called Optimally Balanced Entropy-Based Sampling (OBEBS) reaches a balanced training set at each step to avoid imbalanced problems. Two theorems of the optimal solution for selection strategy are also presented and proved in the paper. At the beginning of the active learning, there is not enough information for supervised machine learning method, thus our selection strategy is based on unsupervised learning (clustering). The cluster membership likelihoods of the items are essential for the algorithm to connect the clusters and the classes, i.e., to find assignment between them. For the best assignment, the Hungarian algorithm is used, and single, multi, and adaptive assignment variants of OBEBS method are developed. Based on generated and real images datasets of handwritten digits, the experimental results show that our method surpasses the state-of-the-art methods.



中文翻译:

零初始化无监督主动学习通过基于最优平衡熵的不平衡问题采样

摘要

鉴于为机器学习任务收集标记训练数据的挑战,主动学习已经变得流行。本文重点关注无监督主动学习的开始,其中根本没有标记数据。这种零初始化无监督主动学习的目的是选择信息量最大的示例——即使是从不平衡的数据集中——进行手动标记。我们提出的选择策略解决方案,称为基于最优平衡熵的采样 (OBEBS),在每一步都达到了一个平衡的训练集,以避免不平衡的问题。文中还提出并证明了选择策略最优解的两个定理。在主动学习开始时,没有足够的信息用于有监督的机器学习方法,因此我们的选择策略基于无监督学习(聚类)。项目的集群成员可能性对于连接集群和类的算法是必不可少的,即找到它们之间的分配。为了获得最佳分配,使用匈牙利算法,并开发了 OBEBS 方法的单、多和自适应分配变体。基于手写数字的生成和真实图像数据集,实验结果表明我们的方法超越了最先进的方法。

更新日期:2021-05-24
down
wechat
bug