Clustering-based data placement in cloud computing: a predictive approach,Cluster Computing

当前位置： X-MOL 学术 › Cluster Comput. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Clustering-based data placement in cloud computing: a predictive approach
Cluster Computing ( IF 4.4 ) Pub Date : 2021-06-16 , DOI: 10.1007/s10586-021-03332-1
Mokhtar Sellami , Haithem Mezni , Mohand Said Hacid , Mohamed Moshen Gammoudi

Nowadays, cloud computing environments have become a natural choice to host and process a huge volume of data. The combination of cloud computing and big data frameworks is an effective way to run data-intensive applications and tasks. Also, an optimal arrangement of data partitions can improve the tasks executions, which is not the case in most big data frameworks. For example, the default distribution of data partitions in Hadoop-based clouds causes several problems, which are mainly related to the load balancing and the resource usage. In addition, most existing data placement solutions are static and lack precision in the placement of data partitions. To overcome these issues, we propose a data placement approach based on the prediction of the future resources usage. We exploit Kernel Density Estimation (KDE) and Fuzzy FCA techniques to, first, forecast the workers’ and tasks’ future resource consumption and, second, cluster data partitions and intensive jobs according to the estimated resource usage. Fuzzy FCA is also used to exclude partitions and jobs that require less resources, which will reduce the needless migrations. To allow monitoring and predicting the workers’ states and the data partitions’ consumption, we modeled the big data cluster as an autonomic service-based system. The obtained results have shown that our solution outperformed existing approaches in terms of migrations rate and resource consumption.

中文翻译：

云计算中基于聚类的数据放置：一种预测方法

如今，云计算环境已成为托管和处理大量数据的自然选择。云计算和大数据框架的结合是运行数据密集型应用程序和任务的有效方式。此外，数据分区的最佳安排可以改善任务执行，这在大多数大数据框架中并非如此。例如，基于Hadoop的云中数据分区的默认分布会导致几个问题，主要与负载均衡和资源使用有关。此外，大多数现有的数据放置解决方案都是静态的，并且在数据分区的放置方面缺乏精确性。为了克服这些问题，我们提出了一种基于未来资源使用预测的数据放置方法。我们利用核密度估计（KDE）和模糊 FCA 技术，首先，根据估计的资源使用情况预测工作人员和任务的未来资源消耗，其次，集群数据分区和密集作业。Fuzzy FCA 还用于排除需要较少资源的分区和作业，这将减少不必要的迁移。为了允许监控和预测工作人员的状态和数据分区的消耗，我们将大数据集群建模为基于自治服务的系统。获得的结果表明，我们的解决方案在迁移率和资源消耗方面优于现有方法。这将减少不必要的迁移。为了允许监控和预测工作人员的状态和数据分区的消耗，我们将大数据集群建模为基于自治服务的系统。获得的结果表明，我们的解决方案在迁移率和资源消耗方面优于现有方法。这将减少不必要的迁移。为了允许监控和预测工作人员的状态和数据分区的消耗，我们将大数据集群建模为基于自治服务的系统。获得的结果表明，我们的解决方案在迁移率和资源消耗方面优于现有方法。

更新日期：2021-06-17

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南

全部期刊列表>>