Density-based Algorithms for Big Data Clustering Using MapReduce Framework,ACM Computing Surveys

当前位置： X-MOL 学术 › ACM Comput. Surv. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Density-based Algorithms for Big Data Clustering Using MapReduce Framework
ACM Computing Surveys ( IF 23.8 ) Pub Date : 2020-09-28 , DOI: 10.1145/3403951
Mariam Khader ₁ , Ghazi Al-Naymat ₂

Affiliation

Clustering is used to extract hidden patterns and similar groups from data. Therefore, clustering as a method of unsupervised learning is a crucial technique for big data analysis owing to the massive number of unlabeled objects involved. Density-based algorithms have attracted research interest, because they help to better understand complex patterns in spatial datasets that contain information about data related to co-located objects. Big data clustering is a challenging task, because the volume of data increases exponentially. However, clustering using MapReduce can help answer this challenge. In this context, density-based algorithms in MapReduce have been largely investigated in the past decade to eliminate the problem of big data clustering. Despite the diversity of the algorithms proposed, the field lacks a structured review of the available algorithms and techniques for desirable partitioning, local clustering, and merging. This study formalizes the problem of density-based clustering using MapReduce, proposes a taxonomy to categorize the proposed algorithms, and provides a systematic and comprehensive comparison of these algorithms according to the partitioning technique, type of local clustering, merging technique, and exactness of their implementations. Finally, the study highlights outstanding challenges and opportunities to contribute to the field of density-based clustering using MapReduce.

中文翻译：

使用 MapReduce 框架的基于密度的大数据聚类算法

聚类用于从数据中提取隐藏模式和相似组。因此，由于涉及大量未标记的对象，聚类作为一种无监督学习方法是大数据分析的关键技术。基于密度的算法引起了研究兴趣，因为它们有助于更好地理解空间数据集中的复杂模式，这些数据集中包含与共定位对象相关的数据信息。大数据聚类是一项具有挑战性的任务，因为数据量呈指数增长。但是，使用 MapReduce 进行集群有助于应对这一挑战。在这种情况下，MapReduce 中基于密度的算法在过去十年中得到了广泛的研究，以消除大数据聚类的问题。尽管提出的算法多种多样，该领域缺乏对理想分区、局部聚类和合并的可用算法和技术的结构化审查。本研究使用 MapReduce 形式化了基于密度的聚类问题，提出了一种分类法来对所提出的算法进行分类，并根据分区技术、局部聚类类型、合并技术和它们的准确性对这些算法进行了系统和全面的比较。实施。最后，该研究强调了使用 MapReduce 为基于密度的聚类领域做出贡献的突出挑战和机遇。并根据分区技术、局部聚类类型、合并技术及其实现的准确性对这些算法进行了系统和全面的比较。最后，该研究强调了使用 MapReduce 为基于密度的聚类领域做出贡献的突出挑战和机遇。并根据分区技术、局部聚类类型、合并技术及其实现的准确性对这些算法进行了系统和全面的比较。最后，该研究强调了使用 MapReduce 为基于密度的聚类领域做出贡献的突出挑战和机遇。

更新日期：2020-09-28

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南11