当前位置: X-MOL 学术Adv. Data Anal. Classif. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Is-ClusterMPP: clustering algorithm through point processes and influence space towards high-dimensional data
Advances in Data Analysis and Classification ( IF 1.6 ) Pub Date : 2019-11-27 , DOI: 10.1007/s11634-019-00379-2
Khadidja Henni , Pierre-Yves Louis , Brigitte Vannier , Ahmed Moussa

Clustering via marked point processes and influence space, Is-ClusterMPP, is a new unsupervised clustering algorithm through adaptive MCMC sampling of a marked point processes of interacting balls. The designed Gibbs energy cost function makes use of k-influence space information. It detects clusters of different shapes, sizes and unbalanced local densities. It aims at dealing also with high-dimensional datasets. By using the k-influence space, Is-ClusterMPP solves the problem of local heterogeneity in densities and prevents the impact of the global density in the detection of unbalanced classes. This concept reduces also the input values amount. The curse of dimensionality is handled by using a local subspace clustering principal embedded in a weighted similarity metric. Balls covering data points are constituting a configuration sampled from a marked point process (MPP). Due to the choice of the energy function, they tends to cover neighboring data, which share the same cluster. The statistical model of random balls is sampled through a Monte Carlo Markovian dynamical approach. The energy is balancing different goals. (1) The data driven objective function is provided according to k-influence space. Data in a high-dense region are favored to be covered by a ball. (2) An interaction part in the energy prevents the balls full overlap phenomenon and favors connected groups of balls. The algorithm through Markov dynamics, does converge towards configurations sampled from the MPP model. This algorithm has been applied in real benchmarks through gene expression data set of various sizes. Different experiments have been done to compare Is-ClusterMPP against the most well-known clustering algorithms and its efficiency is claimed.

中文翻译:

Is-ClusterMPP:通过点过程的聚类算法,并影响空间对高维数据的影响

通过标记点过程和影响空间进行聚类的Is-ClusterMPP是一种新的无监督聚类算法,它是通过对交互球的标记点过程进行自适应MCMC采样来实现的。设计的吉布斯能量成本函数利用了影响k的空间信息。它可以检测形状,大小和局部密度不平衡的簇。它还旨在处理高维数据集。通过使用k-影响空间,Is-ClusterMPP解决了密度局部异质性的问题,并防止了全局密度对不平衡类的检测的影响。该概念还减少了输入值量。通过使用嵌入在加权相似性度量中的局部子空间聚类主体来处理维数的诅咒。覆盖数据点的球构成从标记点过程(MPP)采样的配置。由于选择了能量函数,它们倾向于覆盖共享同一簇的相邻数据。通过蒙特卡洛·马尔科夫动力学方法对随机球的统计模型进行采样。能量正在平衡不同的目标。(1)根据k影响空间提供数据驱动目标函数。高密度区域中的数据倾向于用球覆盖。(2)能量中的相互作用部分防止了球的完全重叠现象,并有利于连接的球组。通过马尔可夫动力学,该算法确实收敛于从MPP模型采样的配置。该算法已通过各种大小的基因表达数据集应用于实际基准中。
更新日期:2019-11-27
down
wechat
bug