当前位置: X-MOL 学术Knowl. Based Syst. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Improving evolutionary constrained clustering using Active Learning
Knowledge-Based Systems ( IF 7.2 ) Pub Date : 2020-09-21 , DOI: 10.1016/j.knosys.2020.106452
Matheus Campos Fernandes , Thiago Ferreira Covões , André Luiz Vizine Pereira

The high cost of labeling data for analysis increased the interest in semi-supervised learning, especially constrained clustering, which usually involves reduced cost. At the same time, Active Learning (AL) aims to minimize the cost of creating labeled datasets by trying to identify which unlabeled data are more relevant for using during the learning process, considering which labels are already available. This paper proposes four AL strategies to an evolutionary constrained clustering algorithm (FIECE-EM) based on Gaussian Mixture Models (GMM), with corresponding theoretical asymptotic analyses. These strategies use information from many different sources, such as the partition, the population, and even specific aspects of the algorithm. We perform empirical evaluation on 14 well-known datasets, as a way to measure the impacts of each strategy both in relation to accuracy and labeling cost. The results were compared with baseline supervised algorithms as well as COBRAS, a state-of-the-art Active Learning algorithm for constrained clustering. Three of the proposed strategies obtained significantly better results than COBRAS in our empirical evaluation. Thus, the combination of FIECE-EM with these strategies can be considered viable alternatives for AL in a constrained clustering setting.



中文翻译:

使用主动学习改善进化约束聚类

标记数据以进行分析的高成本增加了人们对半监督学习的兴趣,尤其是约束聚类,这通常会降低成本。同时,主动学习(AL)旨在通过尝试确定哪些未标记的数据在学习过程中与使用更相关,并考虑哪些标记已经可用,从而将创建标记的数据集的成本降至最低。本文针对基于高斯混合模型(GMM)的进化约束聚类算法(FIECE-EM)提出了四种AL策略,并进行了相应的理论渐近分析。这些策略使用来自许多不同来源的信息,例如分区,总体,甚至算法的特定方面。我们对14个著名的数据集进行了实证评估,作为衡量每种策略对准确性和标签成本的影响的一种方法。将结果与基准监督算法以及用于约束聚类的最先进的主动学习算法COBRAS进行了比较。在我们的经验评估中,三种建议的策略获得的结果明显优于COBRAS。因此,在受限聚类环境中,FIECE-EM与这些策略的组合可以被认为是AL的可行替代方案。

更新日期:2020-09-22
down
wechat
bug