High-Dimensional Text Clustering by Dimensionality Reduction and Improved Density Peak,Wireless Communications and Mobile Computing

当前位置： X-MOL 学术 › Wirel. Commun. Mob. Comput. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

High-Dimensional Text Clustering by Dimensionality Reduction and Improved Density Peak
Wireless Communications and Mobile Computing Pub Date : 2020-10-28 , DOI: 10.1155/2020/8881112
Yujia Sun _{1,

2} , Jan Platoš ₁

Affiliation

This study focuses on high-dimensional text data clustering, given the inability of K-means to process high-dimensional data and the need to specify the number of clusters and randomly select the initial centers. We propose a Stacked-Random Projection dimensionality reduction framework and an enhanced K-means algorithm DPC-K-means based on the improved density peaks algorithm. The improved density peaks algorithm determines the number of clusters and the initial clustering centers of K-means. Our proposed algorithm is validated using seven text datasets. Experimental results show that this algorithm is suitable for clustering of text data by correcting the defects of K-means.

中文翻译：

通过降维和改进的密度峰值实现高维文本聚类

鉴于K-means无法处理高维数据，并且需要指定聚类数并随机选择初始中心，因此本研究着重于高维文本数据聚类。我们提出了一种堆叠随机投影降维框架和基于改进的密度峰值算法的增强型K均值算法DPC-K均值。改进的密度峰值算法确定了K均值的聚类数量和初始聚类中心。我们提出的算法使用七个文本数据集进行了验证。实验结果表明，该算法通过纠正K均值的缺陷，适用于文本数据的聚类。

更新日期：2020-10-30

点击分享查看原文

点击收藏

公开下载

阅读更多本刊最新论文