当前位置: X-MOL 学术Big Data Res. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Cost Optimization for Big Data Workloads Based on Dynamic Scheduling and Cluster-Size Tuning
Big Data Research ( IF 3.5 ) Pub Date : 2021-02-02 , DOI: 10.1016/j.bdr.2021.100203
Marek Grzegorowski , Eftim Zdravevski , Andrzej Janusz , Petre Lameski , Cas Apanowicz , Dominik Ślęzak

Analytical data processing has become the cornerstone of today's businesses success, and it is facilitated by Big Data platforms that offer virtually limitless scalability. However, minimizing the total cost of ownership (TCO) for the infrastructure can be challenging. We propose a novel method to build resilient clusters on cloud resources that are fine-tuned to the particular data processing task. The presented architecture follows the infrastructure-as-a-code paradigm so that the cluster can be dynamically configured and managed. It first identifies the optimal cluster size to perform a job in the required time. Then, by analyzing spot instance price history and using ARIMA models, it optimizes the schedule of the job execution to leverage the discounted prices of the cloud spot market. In particular, we evaluated savings opportunities when using Amazon EC2 spot instances comparing to on-demand resources. The performed experiments confirmed that the prediction module significantly improved the cost-effectiveness of the solution – up to 80% savings compared to the on-demand prices, and at the worst-case, 1% more cost than the absolute minimum. The production deployments of the architecture show that it is invaluable for minimizing the total cost of ownership of analytical data processing solutions.



中文翻译:

基于动态调度和集群大小调整的大数据工作量成本优化

分析数据处理已成为当今业务成功的基石,而提供几乎无限扩展性的大数据平台促进了分析数据处理。但是,最小化基础架构的总拥有成本(TCO)可能具有挑战性。我们提出了一种新颖的方法,可以在针对特定数据处理任务进行了微调的云资源上构建弹性集群。提出的体系结构遵循“基础结构即代码”范式,因此可以动态配置和管理集群。它首先确定最佳群集大小,以在所需时间内执行作业。然后,通过分析现货实例价格历史记录并使用ARIMA模型,它优化了作业执行的计划,以利用云现货市场的折价。特别是,我们比较了使用Amazon EC2竞价型实例和按需资源时的节省机会。进行的实验证实,预测模块显着提高了解决方案的成本效益–与按需价格相比节省了高达80%的费用,在最坏的情况下,成本比绝对最低价格节省了1%。该体系结构的生产部署表明,它对于最小化分析数据处理解决方案的总体拥有成本具有不可估量的价值。

更新日期:2021-02-19
down
wechat
bug