当前位置: X-MOL 学术J. Parallel Distrib. Comput. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Designing an efficient parallel spectral clustering algorithm on multi-core processors in Julia
Journal of Parallel and Distributed Computing ( IF 3.8 ) Pub Date : 2020-01-20 , DOI: 10.1016/j.jpdc.2020.01.003
Zenan Huo , Gang Mei , Giampaolo Casolla , Fabio Giampaolo

Spectral clustering is widely used in data mining, machine learning and other fields. It can identify the arbitrary shape of a sample space and converge to the global optimal solution. Compared with the traditional k-means algorithm, the spectral clustering algorithm has stronger adaptability to data and better clustering results. However, the computation of the algorithm is quite expensive. In this paper, an efficient parallel spectral clustering algorithm on multi-core processors in the Julia language is proposed, and we refer to it as juPSC. The Julia language is a high-performance, open-source programming language. The juPSC is composed of three procedures: (1) calculating the affinity matrix, (2) calculating the eigenvectors, and (3) conducting k-means clustering. Procedures (1) and (3) are computed by the efficient parallel algorithm, and the COO format is used to compress the affinity matrix. Two groups of experiments are conducted to verify the accuracy and efficiency of the juPSC. Experimental results indicate that (1) the juPSC achieves speedups of approximately 14× 18× on a 24-core CPU and that (2) the serial version of the juPSC is faster than the Python version of scikit-learn. Moreover, the structure and functions of the juPSC are designed considering modularity, which is convenient for combination and further optimization with other parallel computing platforms.



中文翻译:

在Julia中为多核处理器设计高效的并行频谱聚类算法

频谱聚类广泛用于数据挖掘,机器学习和其他领域。它可以识别样本空间的任意形状,并收敛到全局最优解。与传统的k均值算法相比,谱聚类算法对数据的适应性更强,聚类效果更好。但是,算法的计算非常昂贵。本文提出了一种有效的Julia语言在多核处理器上的并行频谱聚类算法,我们将其称为juPSC。Julia语言是一种高性能的开放源代码编程语言。juPSC由三个过程组成:(1)计算亲和矩阵,(2)计算特征向量,(3)进行k-表示聚类。通过高效并行算法计算过程(1)和(3),并使用COO格式压缩亲和矩阵。进行两组实验以验证juPSC的准确性和效率。实验结果表明(1)juPSC的加速比约为14× 18岁×(24)的juPSC串行版本比python的scikit-learn更快。此外,juPSC的结构和功能在设计时考虑了模块化,这便于与其他并行计算平台进行组合和进一步优化。

更新日期:2020-01-21
down
wechat
bug