A Survey on Automatic Parameter Tuning for Big Data Processing Systems,ACM Computing Surveys

当前位置： X-MOL 学术 › ACM Comput. Surv. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

A Survey on Automatic Parameter Tuning for Big Data Processing Systems
ACM Computing Surveys ( IF 16.6 ) Pub Date : 2020-05-04 , DOI: 10.1145/3381027
Herodotos Herodotou ₁ , Yuxing Chen ₂ , Jiaheng Lu ₂

Affiliation

Big data processing systems (e.g., Hadoop, Spark, Storm) contain a vast number of configuration parameters controlling parallelism, I/O behavior, memory settings, and compression. Improper parameter settings can cause significant performance degradation and stability issues. However, regular users and even expert administrators grapple with understanding and tuning them to achieve good performance. We investigate existing approaches on parameter tuning for both batch and stream data processing systems and classify them into six categories: rule-based, cost modeling, simulation-based, experiment-driven, machine learning, and adaptive tuning. We summarize the pros and cons of each approach and raise some open research problems for automatic parameter tuning.

中文翻译：

大数据处理系统自动参数调优综述

大数据处理系统（例如 Hadoop、Spark、Storm）包含大量控制并行性、I/O 行为、内存设置和压缩的配置参数。不正确的参数设置会导致严重的性能下降和稳定性问题。但是，普通用户甚至专家管理员都在努力理解和调整它们以实现良好的性能。我们研究了批处理和流数据处理系统的现有参数调整方法，并将它们分为六类：基于规则、成本建模、基于模拟、实验驱动、机器学习和自适应调整。我们总结了每种方法的优缺点，并为自动参数调整提出了一些开放的研究问题。

更新日期：2020-05-04

点击分享查看原文

点击收藏

公开下载

阅读更多本刊最新论文本刊介绍/投稿指南

全部期刊列表>>