当前位置: X-MOL 学术Distrib. Parallel. Databases › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
MDCUT2: a multi-density clustering algorithm with automatic detection of density variation in data with noise
Distributed and Parallel Databases ( IF 1.5 ) Pub Date : 2018-10-16 , DOI: 10.1007/s10619-018-7253-1
Soumaya Louhichi , Mariem Gzara , Hanêne Ben-Abdallah

Despite their adoption in many applications, density-based clustering algorithms perform inefficiently when dealing with data with varied density, imbricated and/or adjacent clusters. Clusters of lower density may be classified as outliers, and adjacent and imbricated clusters with varied density may be aggregated. To handle this inefficiency, the MDCUT algorithm (Multiple Density ClUsTering) (Louhichi et al. in Pattern Recogn Lett 93:48–57, 2017) detects multiple local density parameters to handle density variation in the data. MDCUT extracts density local levels by analyzing mathematically the interpolated k-nearest neighbors function. A clustering Sub-routine is lunched for each density level to form the clusters of that level. Compared to well-known density based clustering algorithms, MDCUT recorded good results on artificial datasets. The main drawback of MDCUT is its sensitivity to the parameter p of the used interpolation technique and the parameter k for the number of nearest neighbors. In this paper, we propose a new extension of the MDCUT algorithm to detect automatically pairs of values (ki,εi) to characterize the density levels in the data, where ki and εi stand respectively for the number of neighbors and the radius threshold for the ith density level. We study the performance of the MDCUT2 algorithm on well-known data sets by comparison to reference density based clustering algorithms. This extension has improved the previous classification results.

中文翻译:

MDCUT2:一种多密度聚类算法,可自动检测带有噪声的数据中的密度变化

尽管在许多应用中采用了基于密度的聚类算法,但在处理具有不同密度、叠片状和/或相邻聚类的数据时,它们的效率很低。较低密度的集群可以被归类为异常值,并且可以聚合具有不同密度的相邻和叠瓦状集群。为了处理这种低效率,MDCUT 算法(多密度聚类)(Louhichi 等人在 Pattern Recogn Lett 93:48–57, 2017 中)检测多个局部密度参数以处理数据中的密度变化。MDCUT 通过对插值的 k 最近邻函数进行数学分析来提取密度局部水平。为每个密度级别提供一个聚类子程序,以形成该级别的聚类。与众所周知的基于密度的聚类算法相比,MDCUT 在人工数据集上记录了良好的结果。MDCUT 的主要缺点是它对所用插值技术的参数 p 和最近邻居数的参数 k 敏感。在本文中,我们提出了 MDCUT 算法的新扩展,以自动检测值对 (ki,εi) 来表征数据中的密度水平,其中 ki 和 εi 分别代表邻居数和半径阈值第 i 个密度级别。我们通过与基于参考密度的聚类算法进行比较来研究 MDCUT2 算法在众所周知的数据集上的性能。这个扩展改进了之前的分类结果。我们提出了 MDCUT 算法的新扩展,以自动检测值对 (ki,εi) 来表征数据中的密度级别,其中 ki 和 εi 分别代表邻居数和第 i 个密度级别的半径阈值。我们通过与基于参考密度的聚类算法进行比较来研究 MDCUT2 算法在众所周知的数据集上的性能。这个扩展改进了之前的分类结果。我们提出了 MDCUT 算法的新扩展,以自动检测值对 (ki,εi) 来表征数据中的密度级别,其中 ki 和 εi 分别代表邻居数和第 i 个密度级别的半径阈值。我们通过与基于参考密度的聚类算法进行比较来研究 MDCUT2 算法在众所周知的数据集上的性能。这个扩展改进了之前的分类结果。
更新日期:2018-10-16
down
wechat
bug