Designing a parallel Feel-the-Way clustering algorithm on HPC systems,The International Journal of High Performance Computing Applications

当前位置： X-MOL 学术 › Int. J. High Perform. Comput. Appl. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Designing a parallel Feel-the-Way clustering algorithm on HPC systems
The International Journal of High Performance Computing Applications ( IF 3.5 ) Pub Date : 2020-11-28 , DOI: 10.1177/1094342020975194
Weijian Zheng ₁ , Dali Wang ₂ , Fengguang Song ₁

Affiliation

This paper introduces a new parallel clustering algorithm, named Feel-the-Way clustering algorithm, that provides better or equivalent convergence rate than the traditional clustering methods by optimizing the synchronization and communication costs. Our algorithm design centers on how to optimize three factors simultaneously: reduced synchronizations, improved convergence rate, and retained same or comparable optimization cost. To compare the optimization cost, we use the Sum of Square Error (SSE) cost as the metric, which is the sum of the square distance between each data point and its assigned clusters. Compared with the traditional MPI k-means algorithm, the new Feel-the-Way algorithm requires less communications among participating processes. As for the convergence rate, the new algorithm requires fewer number of iterations to converge. As for the optimization cost, it obtains the SSE costs that are close to the k-means algorithm. In the paper, we first design the full-step Feel-the-Way k-means clustering algorithm that can significantly reduce the number of iterations that are required by the original k-means clustering method. Next, we improve the performance of the full-step algorithm by adopting an optimized sampling-based approach, named reassignment-history-aware sampling. Our experimental results show that the optimized sampling-based Feel-the-Way method is significantly faster than the widely used k-means clustering method, and can provide comparable optimization costs. More extensive experiments with several synthetic datasets and real-world datasets (e.g., MNIST, CIFAR-10, ENRON, and PLACES-2) show that the new parallel algorithm can outperform the open source MPI k-means library by up to 110% on a high-performance computing system using 4,096 CPU cores. In addition, the new algorithm can take up to 51% fewer iterations to converge than the k-means clustering algorithm.

中文翻译：

在HPC系统上设计一种并行的Feel-the-Way聚类算法

本文介绍了一种新的并行聚类算法，称为Feel-the-Way聚类算法，通过优化同步和通信成本，该算法可提供比传统聚类方法更好或相当的收敛速度。我们的算法设计集中在如何同时优化三个因素上：减少同步，提高收敛速度以及保留相同或相当的优化成本。为了比较优化成本，我们使用平方误差总和（SSE）成本作为度量标准，即每个数据点与其分配的簇之间的平方距离之和。与传统的MPI k-means算法相比，新的Feel-the-Way算法需要较少的参与进程之间的通信。至于收敛速度，新算法需要较少的迭代次数即可收敛。至于优化成本，它获得接近k-means算法的SSE成本。在本文中，我们首先设计了整步的“感觉方式” k均值聚类算法，该算法可以显着减少原始k均值聚类方法所需的迭代次数。接下来，我们通过采用一种优化的基于采样的方法来改善全步算法的性能，该方法名为重新分配历史感知采样。我们的实验结果表明，基于优化采样的基于感觉的方法比广泛使用的k均值聚类方法要快得多，并且可以提供可比的优化成本。使用几个综合数据集和真实数据集（例如MNIST，CIFAR-10，ENRON和PLACES-2）进行的更广泛的实验表明，新的并行算法在性能上比开放源MPI k-means库高110％。使用4,096个CPU内核的高性能计算系统。此外，与k-means聚类算法相比，新算法可减少多达51％的迭代收敛。

更新日期：2020-11-28

点击分享查看原文

点击收藏

阅读更多本刊最新论文