当前位置: X-MOL 学术Big Data Res. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
A Survey on Data-driven Performance Tuning for Big Data Analytics Platforms
Big Data Research ( IF 3.5 ) Pub Date : 2021-01-27 , DOI: 10.1016/j.bdr.2021.100206
Rogério Luís de C. Costa , José Moreira , Paulo Pintor , Veronica dos Santos , Sérgio Lifschitz

Many research works deal with big data platforms looking forward to data science and analytics. These are complex and usually distributed environments, composed of several systems and tools. As expected, there is a need for a closer look at performance issues.

In this work, we review performance tuning strategies in the big data environment. We focus on data-driven tuning techniques, discussing the use of database inspired approaches. Concerning big data and NoSQL stores, performance tuning issues are quite different from the so-called conventional systems. Many existing solutions are mostly ad-hoc activities that do not fit for multiple situations. But there are some categories of data-driven solutions that can be taken as guidelines and incorporated into general-purpose auto-tuning modules for big data systems.

We examine typical performance tuning actions, discussing available solutions to support some of the tuning process's primary activities. We also discuss recent implementations of data-driven performance tuning solutions for big data platforms. We propose an initial classification based on the domain state-of-the-art and present selected tuning actions for large-scale data processing systems. Finally, we organized existing works towards self-tuning big data systems based on this classification and presented general and system-specific tuning recommendations. We found that most of the literature pieces evaluate the use of tuning actions at the physical design perspective, and there is a lack of self-tuning machine-learning-based solutions for big data systems.



中文翻译:

大数据分析平台的数据驱动性能调优调查

许多研究工作涉及大数据平台,它们期待着数据科学和分析。这些是复杂且通常为分布式的环境,由多个系统和工具组成。不出所料,需要仔细研究性能问题。

在这项工作中,我们将回顾大数据环境中的性能调整策略。我们专注于数据驱动的调优技术,讨论了数据库启发方法的使用。关于大数据和NoSQL存储,性能调整问题与所谓的常规系统完全不同。许多现有解决方案大多是临时活动,不适合多种情况。但是,有些类别的数据驱动解决方案可以作为指导原则,并可以并入大数据系统的通用自动调整模块中。

我们研究了典型的性能调整操作,并讨论了可用于支持某些调整过程主要活动的解决方案。我们还将讨论针对大数据平台的数据驱动性能调整解决方案的最新实现。我们提出基于领域最新技术的初始分类,并提出针对大型数据处理系统选择的调整措施。最后,我们根据这种分类组织了有关自调整大数据系统的现有工作,并提出了一般的和特定于系统的调整建议。我们发现,大多数文献都从物理设计的角度评估了调整动作的使用,并且缺乏针对大数据系统的基于自学习机器学习的解决方案。

更新日期:2021-02-10
down
wechat
bug