QuickDSC: Clustering by Quick Density Subgraph Estimation,Information Sciences

当前位置： X-MOL 学术 › Inform. Sci. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

QuickDSC: Clustering by Quick Density Subgraph Estimation
Information Sciences ( IF 8.1 ) Pub Date : 2021-09-17 , DOI: 10.1016/j.ins.2021.09.048
Xichen Zheng ₁ , Chengsen Ren ₁ , Yiyang Yang ₁ , Zhiguo Gong ₂ , Xiang Chen ₃ , Zhifeng Hao ₄

Affiliation

Density-based clustering is a traditional research topic with the capability of determining clusters of arbitrary shapes. Besides, through the Density Estimator (DE), density-based methods such as MeanShift, and QuickShift can find the local density maximums as Modes that are excellent representatives of the clusters. However, concentrating on the modes only may suffer from the over-segmentation problem. On the other hand, most density-based methods cannot satisfy the scenario requiring partitioning the data samples into exactly $K$ clusters. To overcome these issues, QuickDSC: a novel and efficient clustering algorithm that groups the samples through the Quick Density Subgraph Estimation, is proposed in this work. It firstly identifies the high-density-connected samples as the Density Subgraphs (DSs). And then, the importance of DSs is estimated from two aspects: density and geometric weight. The top- $K$ important DSs are selected as the cluster centers and based on which the cluster memberships of remaining samples are determined. QuickDSC incorporates three crucial clustering attributes: (1) the cluster centroids are modes (as in density-based methods); (2) able to efficiently return results by utilizing the underlying density structure (as in hierarchical clustering methods); and (3) it explicitly returns $K$ clusters (e.g., $K$ -Means, $K$ -Modes). In addition, QuickDSC is theoretically and empirically efficient. It is only slightly slower than classical clustering methods such as $K$ -Means and DBSCAN. Experiments on artificial and real-world datasets demonstrate the advantages of the proposed method, and the clustering quality outperforms the state-of-the-art approaches.

中文翻译：

QuickDSC：通过快速密度子图估计进行聚类

基于密度的聚类是一个传统的研究课题，能够确定任意形状的聚类。此外，通过密度估计器 (DE)，MeanShift 和 QuickShift 等基于密度的方法可以找到局部密度最大值作为模式，这些模式是集群的优秀代表。然而，只关注模式可能会遇到过度分割的问题。另一方面，大多数基于密度的方法不能满足需要将数据样本精确划分为 $钾$ 集群。为了克服这些问题，在这项工作中提出了 QuickDSC：一种通过快速密度子图估计对样本进行分组的新颖高效的聚类算法。它首先将高密度连接的样本识别为密度子图（DS）。然后，从两个方面估计DSs的重要性：密度和几何权重。顶端- $钾$ 选择重要的 DS 作为聚类中心，并以此为基础确定剩余样本的聚类成员。QuickDSC 包含三个关键的聚类属性：(1) 聚类质心是模式（如在基于密度的方法中）；(2) 能够利用底层的密度结构有效地返回结果（如在层次聚类方法中）；(3) 它显式返回 $钾$ 集群（例如， $钾$ -方法， $钾$ -模式）。此外，QuickDSC 在理论上和经验上都是有效的。它只比经典的聚类方法稍慢，例如 $钾$ -手段和DBSCAN。在人工和真实世界数据集上的实验证明了所提出方法的优势，并且聚类质量优于最先进的方法。

更新日期：2021-09-17

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南

全部期刊列表>>