Performance enhancement of a dynamic K-means algorithm through a parallel adaptive strategy on multicore CPUs,Journal of Parallel and Distributed Computing

当前位置： X-MOL 学术 › J. Parallel Distrib. Comput. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Performance enhancement of a dynamic K-means algorithm through a parallel adaptive strategy on multicore CPUs
Journal of Parallel and Distributed Computing ( IF 3.8 ) Pub Date : 2020-06-25 , DOI: 10.1016/j.jpdc.2020.06.010
Giuliano Laccetti , Marco Lapegna , Valeria Mele , Diego Romano , Lukasz Szustak

The K-means algorithm is one of the most popular algorithms in Data Science, and it is aimed to discover similarities among the elements belonging to large datasets, partitioning them in $K$ distinct groups called clusters. The main weakness of this technique is that, in real problems, it is often impossible to define the value of $K$ as input data. Furthermore, the large amount of data used for useful simulations makes impracticable the execution of the algorithm on traditional architectures. In this paper, we address the previous two issues. On the one hand, we propose a method to dynamically define the value of $K$ by optimizing a suitable quality index with special care to the computational cost. On the other hand, to improve the performance and the effectiveness of the algorithm, we propose a strategy for parallel implementation on modern multicore CPUs.

中文翻译：

通过多核CPU上的并行自适应策略提高动态K均值算法的性能

K均值算法是数据科学中最流行的算法之一，其目的是发现属于大型数据集的元素之间的相似性，并将其划分为 $ķ$ 不同的组称为群集。该技术的主要缺点是，在实际问题中，通常无法定义 $ķ$ 作为输入数据。此外，用于有用仿真的大量数据使算法无法在传统架构上执行。在本文中，我们解决了前两个问题。一方面，我们提出了一种动态定义 $ķ$ 通过优化计算质量来优化合适的质量指标。另一方面，为了提高算法的性能和有效性，我们提出了一种在现代多核CPU上并行实现的策略。

更新日期：2020-06-29

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南

全部期刊列表>>