当前位置: X-MOL 学术Inform. Sci. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Data clustering via cooperative games: A novel approach and comparative study
Information Sciences Pub Date : 2020-09-24 , DOI: 10.1016/j.ins.2020.09.018
André L. V. Coelho , Nelson C. Sandes

Arguably, the main purpose of cluster analysis is to develop algorithms to reveal natural groupings (clusterings) over a set of data points based on their similarity. On the other hand, the focus of cooperative game theory (CGT) is to study the formation of groups (coalitions) of decision makers (players) and ways to split the resulting income among them. Due to the conceptual similitude between these fields, algorithms rooted in CGT have recently emerged for tackling the data clustering problem. In this work, we revisit two such algorithms, one based on cluster prototypes (Biobjective Game Clustering – BiGC) and the other based on dense regions of data points (Density-Restricted Agglomerative Clustering – DRAC). We also present a novel partitional clustering algorithm, referred to as HGC (after Hedonic Game based Clustering), which is grounded on theoretical results stemming from the subclass of hedonic games. Two HGC versions are investigated, which differ in the order of the players in the game, and a detailed factorial simulation study is reported to analyze how sensitive they are to three relevant factors, namely number of clusters, number of features, and noise level. Besides, a heuristic to calibrate the value of HGC’s single control parameter (viz., the number of nearest neighbors of each point) is provided, so as to yield high-quality partitions. To compare the performance of the CGT algorithms, a series of experiments were conducted on UCI and gene-expression data sets, the majority of which being high dimensional. Overall, the results measured by 10 external validation indices evidence that HGC is usually more stable and effective than DRAC and BiGC. They also show that HGC is very competitive (sometimes, considerably better) to well-known clustering algorithms/variants (specifically, k-means, k-means++, affinity propagation, two variants of hierarchical clustering, and the density peak clustering algorithm). Remarkably, HGC could fully recover the true clustering structures for two gene-expression data sets.



中文翻译:

通过合作游戏进行数据聚类:一种新方法和比较研究

可以说,聚类分析的主要目的是开发一种算法,以基于相似性在一组数据点上揭示自然分组(聚类)。另一方面,合作博弈理论(CGT)的重点是研究决策者(参与者)的小组(联盟)的形成以及在其中进行分配的方式。由于这些字段之间的概念相似性,最近出现了基于CGT的算法来解决数据聚类问题。在这项工作中,我们将回顾两种这样的算法,一种基于聚类原型(Biobjective Game Clustering – BiGC),另一种基于密集数据点区域(Density-Restricted Agglomerative Clustering)– DRAC)。我们还提出了一种新颖的分区聚类算法,称为HGC(在基于Hedonic Game的聚类之后),其依据是享乐游戏的子类产生的理论结果。研究了两个HGC版本,它们在游戏中的玩家顺序有所不同,并且进行了详细的阶乘模拟研究,以分析它们对三个相关因素(群集数量,特征数量和噪声水平)的敏感程度。此外,提供了一种用于校准HGC单个控制参数(即,每个点的最近邻居的数量)的值的试探法,从而产生高质量的分区。为了比较CGT算法的性能,对UCI和基因表达数据集进行了一系列实验,其中大多数是高维的。总体而言,通过10个外部验证指标测得的结果表明,HGC通常比DRAC和BiGC更稳定和有效。k-均值,k-均值++,亲和力传播,层次聚类的两个变体以及密度峰值聚类算法)。值得注意的是,HGC可以完全恢复两个基因表达数据集的真实聚类结构。

更新日期:2020-09-24
down
wechat
bug