Decreasing the execution time of reducers by revising clustering based on the futuristic greedy approach,Journal of Big Data

当前位置： X-MOL 学术 › J. Big Data › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Decreasing the execution time of reducers by revising clustering based on the futuristic greedy approach
Journal of Big Data ( IF 8.6 ) Pub Date : 2020-01-09 , DOI: 10.1186/s40537-019-0279-z
Ali Bakhthemmat , Mohammad Izadi

MapReduce is used within the Hadoop framework, which handles two important tasks: mapping and reducing. Data clustering in mappers and reducers can decrease the execution time, as similar data can be assigned to the same reducer with one key. Our proposed method decreases the overall execution time by clustering and lowering the number of reducers. Our proposed algorithm is composed of five phases. In the first phase, data are stored in the Hadoop structure. In the second phase, we cluster data using the MR-DBSCAN-KD method in order to determine all of the outliers and clusters. Then, the outliers are assigned to the existing clusters using the futuristic greedy method. At the end of the second phase, similar clusters are merged together. In the third phase, clusters are assigned to the reducers. Note that fewer reducers are required for this task by applying approximated load balancing between the reducers. In the fourth phase, the reducers execute their jobs in each cluster. Eventually, in the final phase, reducers return the output. Decreasing the number of reducers and revising the clustering helped reducers to perform their jobs almost simultaneously. Our research results indicate that the proposed algorithm improves the execution time by about 3.9% less than the fastest algorithm in our experiments.

中文翻译：

通过基于未来贪婪方法的聚类修改来减少化简器的执行时间

MapReduce在Hadoop框架内使用，该框架处理两项重要任务：映射和归约。映射器和化简器中的数据聚类可以减少执行时间，因为可以使用一个键将相似的数据分配给同一化简器。我们提出的方法通过聚类并减少化简器的数量来减少整体执行时间。我们提出的算法由五个阶段组成。在第一阶段，数据存储在Hadoop结构中。在第二阶段，我们使用MR-DBSCAN-KD方法对数据进行聚类，以确定所有异常值和聚类。然后，使用未来派贪婪方法将离群值分配给现有聚类。在第二阶段结束时，将相似的群集合并在一起。在第三阶段，将集群分配给减速器。请注意，通过在减速器之间应用近似的负载平衡，此任务需要的减速器更少。在第四阶段，Reducer在每个集群中执行其作业。最终，在最后阶段，减速器返回输出。减少reducer的数量并修订集群有助于reduce几乎同时执行其工作。我们的研究结果表明，与我们实验中最快的算法相比，所提出的算法将执行时间缩短了约3.9％。

更新日期：2020-01-09

点击分享查看原文

点击收藏

公开下载

阅读更多本刊最新论文