Data clustering based on principal curves,Advances in Data Analysis and Classification

当前位置： X-MOL 学术 › Adv. Data Anal. Classif. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Data clustering based on principal curves
Advances in Data Analysis and Classification ( IF 1.6 ) Pub Date : 2019-06-11 , DOI: 10.1007/s11634-019-00363-w
Elson Claudio Correa Moraes , Danton Diego Ferreira , Giovani Bernardes Vitor , Bruno Henrique Groenner Barbosa

In this contribution we present a new method for data clustering based on principal curves. Principal curves consist of a nonlinear generalization of principal component analysis and may also be regarded as continuous versions of 1D self-organizing maps. The proposed method implements the k-segment algorithm for principal curves extraction. Then, the method divides the principal curves into two or more curves, according to the number of clusters defined by the user. Thus, the distance between the data points and the generate curves is calculated and, afterwards, the classification is performed according to the smallest distance found. The method was applied to nine databases with different dimensionality and number of classes. The results were compared with three clustering algorithms: the k-means algorithm and the 1-D and 2-D self-organizing map algorithms. Experiments show that the method is suitable for clusters with elongated and spherical shapes and achieved significantly better results in some data sets than other clustering algorithms used in this work.

中文翻译：

基于主曲线的数据聚类

在这一贡献中，我们提出了一种基于主曲线的数据聚类的新方法。主曲线由主成分分析的非线性概括组成，也可以视为一维自组织图的连续版本。所提出的方法实现了用于主曲线提取的k段算法。然后，该方法根据用户定义的簇数将主曲线分为两条或更多条曲线。因此，计算数据点与生成曲线之间的距离，然后根据找到的最小距离执行分类。该方法被应用于具有不同维度和类别数量的九个数据库。将结果与三种聚类算法进行了比较：k-means算法以及一维和二维自组织映射算法。实验表明，该方法适用于具有细长球形形状的聚类，并且在某些数据集中取得了比本工作中使用的其他聚类算法明显更好的结果。

更新日期：2019-06-11

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南

全部期刊列表>>