Network‐based semisupervised clustering,Applied Stochastic Models in Business and Industry

当前位置： X-MOL 学术 › Appl. Stoch. Models Bus.Ind. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Network‐based semisupervised clustering
Applied Stochastic Models in Business and Industry ( IF 1.4 ) Pub Date : 2021-03-24 , DOI: 10.1002/asmb.2618
Luca Frigau ₁ , Giulia Contu ₁ , Francesco Mola ₁ , Claudio Conversano ₁

Affiliation

Semisupervised clustering extends standard clustering methods to the semisupervised setting, in some cases considering situations when clusters are associated with a given outcome variable that acts as a “noisy surrogate,” that is a good proxy of the unknown clustering structure. In this article, a novel approach to semisupervised clustering associated with an outcome variable named network‐based semisupervised clustering (NeSSC) is introduced. It combines an initialization, a training and an agglomeration phase. In the initialization and training a matrix of pairwise affinity of the instances is estimated by a classifier. In the agglomeration phase the matrix of pairwise affinity is transformed into a complex network, in which a community detection algorithm searches the underlying community structure. Thus, a partition of the instances into clusters highly homogeneous in terms of the outcome is obtained. We consider a particular specification of NeSSC that uses classification or regression trees as classifiers and the Louvain, Label propagation and Walktrap as possible community detection algorithm. NeSSC's stopping criterion and the choice of the optimal partition of the original data are also discussed. Several applications on both real and simulated data are presented to demonstrate the effectiveness of the proposed semisupervised clustering method and the benefits it provides in terms of improved interpretability of results with respect to three alternative semisupervised clustering methods.

中文翻译：

基于网络的半监督群集

半监督聚类将标准聚类方法扩展到半监督环境，在某些情况下，考虑聚类与给定结果变量相关联的情况，该结果变量充当“嘈杂代理”，是未知聚类结构的良好代理。本文介绍了一种新的半监督聚类方法，该方法与名为基于网络的半监督聚类（NeSSC）的结果变量相关联。它结合了初始化，培训和集聚阶段。在初始化和训练中，由分类器估计实例的成对相似性矩阵。在集聚阶段，成对亲和力矩阵被转换成一个复杂的网络，在该网络中，社区检测算法搜索了底层的社区结构。因此，根据结果将实例划分为高度均一的群集。我们考虑使用分类树或回归树作为分类器，并使用Louvain，Label传播和Walktrap作为可能的社区检测算法的NeSSC特定规范。还讨论了NeSSC的停止准则以及原始数据的最佳分区的选择。提出了一些在实际数据和模拟数据上的应用，以证明所提出的半监督聚类方法的有效性，以及相对于三种可选的半监督聚类方法在改善结果的可解释性方面所提供的好处。我们考虑一个特殊的NeSSC规范，该规范使用分类树或回归树作为分类器，并使用Louvain，Label传播和Walktrap作为可能的社区检测算法。还讨论了NeSSC的停止准则以及原始数据的最佳分区的选择。提出了一些在实际数据和模拟数据上的应用，以证明所提出的半监督聚类方法的有效性，以及相对于三种可选的半监督聚类方法在改善结果的可解释性方面所提供的好处。我们考虑使用分类树或回归树作为分类器，并使用Louvain，Label传播和Walktrap作为可能的社区检测算法的NeSSC特定规范。还讨论了NeSSC的停止准则以及原始数据的最佳分区的选择。提出了一些在实际数据和模拟数据上的应用，以证明所提出的半监督聚类方法的有效性，以及相对于三种可选的半监督聚类方法在改善结果的可解释性方面所提供的好处。

更新日期：2021-04-13

点击分享查看原文

点击收藏

公开下载

阅读更多本刊最新论文本刊介绍/投稿指南

全部期刊列表>>