当前位置: X-MOL 学术J. Supercomput. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Predicting the performance of big data applications on the cloud
The Journal of Supercomputing ( IF 3.3 ) Pub Date : 2020-05-15 , DOI: 10.1007/s11227-020-03307-w
D. Ardagna , E. Barbierato , E. Gianniti , M. Gribaudo , T. B. M. Pinto , A. P. C. da Silva , J. M. Almeida

Data science applications have become widespread as a means to extract knowledge from large datasets. Such applications are often characterized by highly heterogeneous and irregular data access patterns, thus often being referred to as big data applications. Such characteristics make the application execution quite challenging for existing software and hardware infrastructures to meet their resource demands. The cloud computing paradigm, in turn, offers a natural hosting solution to such applications since its on-demand pricing model allows allocating effectively computing resources according to application’s needs. However, these properties impose extra challenge to the accurate performance prediction of cloud-based applications, which is a key step to adequate capacity planning and managing of the hosting infrastructure. In this article, we tackle this challenge by exploring three modeling approaches for predicting the performance of big data applications running on the cloud. We evaluate two queuing-based analytical models and dagSim, a fast ad-hoc simulator, in various scenarios based on different applications and infrastructure setups. The considered approaches are compared in terms of prediction accuracy and execution time. Our results indicate that our two best approaches, one analytical model and dagSim, can predict average application execution times with only up to a $$7\%$$ 7 % relative error, on average. Moreover, a comparison with the widely used event-based simulator available with the Java Modeling Tool (JMT) suite demonstrates that both the analytical model and dagSim run very fast, requiring at least two orders of magnitude lower execution time than JMT while providing slightly better accuracy, being thus practical for online prediction.

中文翻译:

预测云上大数据应用的性能

作为从大型数据集中提取知识的一种手段,数据科学应用已经变得广泛。此类应用程序通常具有高度异构和不规则的数据访问模式,因此通常被称为大数据应用程序。这些特性使得应用程序执行对于现有软件和硬件基础设施满足其资源需求非常具有挑战性。反过来,云计算范式为此类应用程序提供了一种自然托管解决方案,因为其按需定价模型允许根据应用程序的需求有效地分配计算资源。然而,这些特性对基于云的应用程序的准确性能预测提出了额外的挑战,这是充分规划和管理托管基础设施的关键步骤。在本文中,我们通过探索用于预测在云上运行的大数据应用程序的性能的三种建模方法来应对这一挑战。我们在基于不同应用程序和基础设施设置的各种场景中评估了两种基于排队的分析模型和 dagSim,一种快速的临时模拟器。所考虑的方法在预测准确性和执行时间方面进行了比较。我们的结果表明,我们的两种最佳方法,一种分析模型和 dagSim,可以预测平均应用程序执行时间,平均而言,相对误差最高可达 $$7\%$$ 7%。此外,与 Java 建模工具 (JMT) 套件中广泛使用的基于事件的模拟器的比较表明,分析模型和 dagSim 都运行得非常快,
更新日期:2020-05-15
down
wechat
bug