当前位置: X-MOL 学术Electron. Commer. Res. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Innovative study on clustering center and distance measurement of K-means algorithm: mapreduce efficient parallel algorithm based on user data of JD mall
Electronic Commerce Research ( IF 3.7 ) Pub Date : 2021-03-31 , DOI: 10.1007/s10660-021-09458-z
Yang Liu , Xinxin Du , Shuaifeng Ma

The traditional K-means algorithm is very sensitive to the selection of clustering centers and the calculation of distances, so the algorithm easily converges to a locally optimal solution. In addition, the traditional algorithm has slow convergence speed and low clustering accuracy, as well as memory bottleneck problems when processing massive data. Therefore, an improved K-means algorithm is proposed in this paper. In this algorithm, the selection of the initial points in the traditional clustering algorithm is improved first, and then a new global measure, the effective distance measure, is proposed. Its main idea is to calculate the effective distance between two data samples by sparse reconstruction. Finally, on the basis of the MapReduce framework, the efficiency of the algorithm is further improved by adjusting the Hadoop cluster. Based on the real customer data from the JD Mall dataset, this paper introduces the DBI, Rand and other indicators to evaluate the clustering effects of various algorithms. The results show that the proposed algorithm not only has good convergence and accuracy but also achieves better performances than those of other compared algorithms.



中文翻译:

K-means算法聚类中心和测距的创新研究:基于京东商城用户数据的Mapreduce高效并行算法

传统的K均值算法对聚类中心的选择和距离的计算非常敏感,因此该算法很容易收敛到局部最优解。另外,传统算法收敛速度慢,聚类精度低,在处理海量数据时存在存储瓶颈问题。因此,本文提出了一种改进的K-means算法。该算法首先改进了传统聚类算法中初始点的选择,然后提出了一种新的全局测度,即有效距离测度。其主要思想是通过稀疏重建来计算两个数据样本之间的有效距离。最后,在MapReduce框架的基础上,通过调整Hadoop集群进一步提高了算法的效率。基于来自JD Mall数据集的真实客户数据,本文介绍了DBI,Rand和其他指标来评估各种算法的聚类效果。结果表明,该算法不仅具有良好的收敛性和准确性,而且比其他算法具有更好的性能。

更新日期:2021-04-01
down
wechat
bug