当前位置: X-MOL 学术Int. J. Pattern Recognit. Artif. Intell. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Unsupervised Feature Selection Based on Spectral Clustering with Maximum Relevancy and Minimum Redundancy Approach
International Journal of Pattern Recognition and Artificial Intelligence ( IF 0.9 ) Pub Date : 2021-08-17 , DOI: 10.1142/s0218001421500312
Bahareh Khozaei 1 , Mahdi Eftekhari 2
Affiliation  

In this paper, two novel approaches for unsupervised feature selection are proposed based on the spectral clustering. In the first proposed method, spectral clustering is employed over the features and the center of clusters is selected as well as their nearest-neighbors. These features have a minimum similarity (redundancy) between themselves since they belong to different clusters. Next, samples of data sets are clustered employing spectral clustering so that to the samples of each cluster a specific pseudo-label is assigned. After that according to the obtained pseudo-labels, the information gain of the features is computed that secures the maximum relevancy. Finally, the intersection of the selected features in the two previous steps is determined that simultaneously guarantees both the maximum relevancy and minimum redundancy. Our second proposed approach is very similar to the first one whose only but significant difference with the first method is that it selects one feature from each cluster and sorts all the features in terms of their relevancy. Then, by appending the selected features to a sorted list and ignoring them for the next step, the algorithm continues with the remaining features until all the features to be appended into the sorted list. Both of our proposed methods are compared with state-of-the-art methods and the obtained results confirm the performance of our proposed approaches especially the second one.

中文翻译:

基于最大相关性和最小冗余方法的谱聚类的无监督特征选择

在本文中,提出了两种基于谱聚类的无监督特征选择新方法。在第一个提出的方法中,对特征采用谱聚类,并选择聚类中心及其最近邻。这些特征在它们之间具有最小的相似性(冗余),因为它们属于不同的集群。接下来,使用谱聚类对数据集的样本进行聚类,以便为每个聚类的样本分配一个特定的伪标签。然后根据获得的伪标签,计算特征的信息增益,以确保最大相关性。最后,确定前两个步骤中所选特征的交集,同时保证最大相关性和最小冗余。我们提出的第二种方法与第一种方法非常相似,它与第一种方法唯一但显着的区别是它从每个集群中选择一个特征,并根据它们的相关性对所有特征进行排序。然后,通过将选定的特征附加到排序列表并在下一步忽略它们,算法继续处理剩余的特征,直到所有要附加到排序列表中的特征。我们提出的两种方法都与最先进的方法进行了比较,得到的结果证实了我们提出的方法的性能,尤其是第二种方法。通过将选定的特征附加到排序列表并在下一步忽略它们,算法继续处理剩余的特征,直到所有特征都被附加到排序列表中。我们提出的两种方法都与最先进的方法进行了比较,得到的结果证实了我们提出的方法的性能,尤其是第二种方法。通过将选定的特征附加到排序列表并在下一步忽略它们,算法继续处理剩余的特征,直到所有特征都被附加到排序列表中。我们提出的两种方法都与最先进的方法进行了比较,得到的结果证实了我们提出的方法的性能,尤其是第二种方法。
更新日期:2021-08-17
down
wechat
bug