当前位置: X-MOL 学术J. Classif. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Proximity Curves for Potential-Based Clustering
Journal of Classification ( IF 2 ) Pub Date : 2019-12-18 , DOI: 10.1007/s00357-019-09348-y
Attila Csenki , Daniel Neagu , Denis Torgunov , Natasha Micic

The concept of proximity curve and a new algorithm are proposed for obtaining clusters in a finite set of data points in the finite dimensional Euclidean space. Each point is endowed with a potential constructed by means of a multi-dimensional Cauchy density, contributing to an overall anisotropic potential function. Guided by the steepest descent algorithm, the data points are successively visited and removed one by one, and at each stage the overall potential is updated and the magnitude of its local gradient is calculated. The result is a finite sequence of tuples, the proximity curve, whose pattern is analysed to give rise to a deterministic clustering. The finite set of all such proximity curves in conjunction with a simulation study of their distribution results in a probabilistic clustering represented by a distribution on the set of dendrograms. A two-dimensional synthetic data set is used to illustrate the proposed potential-based clustering idea. It is shown that the results achieved are plausible since both the ‘geographic distribution’ of data points as well as the ‘topographic features’ imposed by the potential function are well reflected in the suggested clustering. Experiments using the Iris data set are conducted for validation purposes on classification and clustering benchmark data. The results are consistent with the proposed theoretical framework and data properties, and open new approaches and applications to consider data processing from different perspectives and interpret data attributes contribution to patterns.

中文翻译:

基于势的聚类的邻近曲线

提出了接近曲线的概念和一种新算法,用于在有限维欧几里德空间中的有限数据点集内获得聚类。每个点都被赋予了一个通过多维柯西密度构造的势,有助于整体各向异性势函数。在最速下降算法的指导下,数据点被依次访问和一个一个删除,并在每个阶段更新整体潜力并计算其局部梯度的大小。结果是元组的有限序列,即邻近曲线,分析其模式以产生确定性聚类。所有此类接近曲线的有限集连同对其分布的模拟研究导致概率聚类,由树状图集上的分布表示。二维合成数据集用于说明所提出的基于潜在的聚类思想。结果表明,所获得的结果是合理的,因为数据点的“地理分布”以及势函数施加的“地形特征”都很好地反映在建议的聚类中。使用 Iris 数据集进行的实验是为了验证分类和聚类基准数据。结果与提出的理论框架和数据属性一致,
更新日期:2019-12-18
down
wechat
bug