当前位置: X-MOL 学术Data Min. Knowl. Discov. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
A Lagrangian-based score for assessing the quality of pairwise constraints in semi-supervised clustering
Data Mining and Knowledge Discovery ( IF 2.8 ) Pub Date : 2021-09-16 , DOI: 10.1007/s10618-021-00794-0
Rodrigo Randel 1 , Daniel Aloise 1 , Simon J. Blanchard 2 , Alain Hertz 3
Affiliation  

Clustering algorithms help identify homogeneous subgroups from data. In some cases, additional information about the relationship among some subsets of the data exists. When using a semi-supervised clustering algorithm, an expert may provide additional information to constrain the solution based on that knowledge and, in doing so, guide the algorithm to a more useful and meaningful solution. Such additional information often takes the form of a cannot-link constraint (i.e., two data points cannot be part of the same cluster) or a must-link constraint (i.e., two data points must be part of the same cluster). A key challenge for users of such constraints in semi-supervised learning algorithms, however, is that the addition of inaccurate or conflicting constraints can decrease accuracy and little is known about how to detect whether expert-imposed constraints are likely incorrect. In the present work, we introduce a method to score each must-link and cannot-link pairwise constraint as likely incorrect. Using synthetic experimental examples and real data, we show that the resulting impact score can successfully identify individual constraints that should be removed or revised.



中文翻译:

用于评估半监督聚类中成对约束质量的基于拉格朗日的分数

聚类算法有助于从数据中识别同质子组。在某些情况下,存在有关数据的某些子集之间关系的附加信息。当使用半监督聚类算法时,专家可能会提供额外的信息来限制基于该知识的解决方案,并在这样做时将算法引导到更有用和更有意义的解决方案。此类附加信息通常采用无法链接约束(即两个数据点不能属于同一集群的一部分)或必须链接约束(即两个数据点必须属于同一集群的一部分)的形式。然而,对于半监督学习算法中的此类约束的用户来说,这是一个关键挑战,是添加不准确或冲突的约束会降低准确性,而关于如何检测专家强加的约束是否可能不正确知之甚少。在目前的工作中,我们引入了一种方法来将每个必须链接和不能链接的成对约束评分为可能不正确。使用合成实验示例和真实数据,我们表明产生的影响分数可以成功识别应删除或修改的单个约束。

更新日期:2021-09-17
down
wechat
bug