Active learning of constraints for weighted feature selection,Advances in Data Analysis and Classification

当前位置： X-MOL 学术 › Adv. Data Anal. Classif. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Active learning of constraints for weighted feature selection
Advances in Data Analysis and Classification ( IF 1.6 ) Pub Date : 2020-07-10 , DOI: 10.1007/s11634-020-00408-5
Samah Hijazi , Denis Hamad , Mariam Kalakech , Ali Kalakech

Pairwise constraints, a cheaper kind of supervision information that does not need to reveal the class labels of data points, were initially suggested to enhance the performance of clustering algorithms. Recently, researchers were interested in using them for feature selection. However, in most current methods, pairwise constraints are provided passively and generated randomly over multiple algorithmic runs by which the results are averaged. This leads to the need of a large number of constraints that might be redundant, unnecessary, and under some circumstances even inimical to the algorithm’s performance. It also masks the individual effect of each constraint set and introduces a human labor-cost burden. Therefore, in this paper, we suggest a framework for actively selecting and then propagating constraints for feature selection. For that, we benefit from the graph Laplacian that is defined on the similarity matrix. We assume that when a small perturbation of the similarity value between a data couple leads to a more well-separated cluster indicator based on the second eigenvector of the graph Laplacian, this couple is definitely expected to be a pairwise query of higher and more significant impact. Constraints propagation on the other side ensures increasing supervision information while decreasing the cost of human-labor. Finally, experimental results validated our proposal in comparison to other known feature selection methods and proved to be prominent.

中文翻译：

主动学习约束以进行加权特征选择

最初提出了成对约束（一种不需要显示数据点类别标签的便宜的监管信息）来增强聚类算法的性能。最近，研究人员对使用它们进行特征选择感兴趣。但是，在大多数当前方法中，成对约束是被动提供的，并且是在多个算法运行中随机生成的，对结果进行平均。这导致需要大量约束，这些约束可能是多余的，不必要的，并且在某些情况下甚至对算法的性能不利。它还掩盖了每个约束集的个体影响，并引入了人工成本负担。因此，在本文中，我们提出了一个框架，用于主动选择然后传播约束以进行特征选择。为了那个原因，我们受益于在相似性矩阵上定义的图拉普拉斯算子。我们假设，当数据对之间的相似度值受到很小的扰动而导致基于图拉普拉斯算子的第二个本征向量的聚类指标分离度更高时，则可以肯定该对是成对查询，其影响越来越大。另一方的约束传播可确保增加监管信息，同时降低人工成本。最后，与其他已知特征选择方法相比，实验结果验证了我们的建议，并被证明是突出的。我们假设，当数据对之间的相似度值受到很小的扰动而导致基于图拉普拉斯算子的第二个本征向量的聚类指标分离度更高时，则可以肯定该对是成对查询，其影响越来越大。另一方的约束传播可确保增加监管信息，同时降低人工成本。最后，与其他已知特征选择方法相比，实验结果验证了我们的建议，并被证明是突出的。我们假设，当数据对之间的相似度值受到很小的扰动而导致基于图拉普拉斯算子的第二个本征向量的聚类指标分离度更高时，则可以肯定该对是成对查询，其影响越来越大。另一方的约束传播可确保增加监管信息，同时降低人工成本。最后，与其他已知特征选择方法相比，实验结果验证了我们的建议，并被证明是突出的。

更新日期：2020-07-10

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南

全部期刊列表>>