A Survey of Parallel Clustering Algorithms Based on Spark,Scientific Programming

当前位置： X-MOL 学术 › Sci. Program. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

A Survey of Parallel Clustering Algorithms Based on Spark
Scientific Programming Pub Date : 2020-09-01 , DOI: 10.1155/2020/8884926
Wen Xiao ₁ , Juan Hu ₂

Affiliation

Clustering is one of the most important unsupervised machine learning tasks, which is widely used in information retrieval, social network analysis, image processing, and other fields. With the explosive growth of data, the classical clustering algorithms cannot meet the requirements of clustering for big data. Spark is one of the most popular parallel processing platforms for big data, and many researchers have proposed many parallel clustering algorithms based on Spark. In this paper, the existing parallel clustering algorithms based on Spark are classified and summarized, the parallel design framework of each kind of algorithms is discussed, and after comparing different kinds of algorithms, the direction of the future research is discussed.

中文翻译：

基于Spark的并行聚类算法综述

聚类是最重要的无监督机器学习任务之一，广泛应用于信息检索、社交网络分析、图像处理等领域。随着数据的爆炸式增长，经典的聚类算法已经无法满足大数据的聚类需求。Spark是最流行的大数据并行处理平台之一，许多研究人员提出了许多基于Spark的并行聚类算法。本文对现有的基于Spark的并行聚类算法进行了分类和总结，讨论了各种算法的并行设计框架，并在比较了不同的算法之后，讨论了未来的研究方向。

更新日期：2020-09-01

点击分享查看原文

点击收藏

公开下载

阅读更多本刊最新论文本刊介绍/投稿指南11