当前位置: X-MOL 学术Sensors › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
A Novel Reinforcement Learning Approach for Spark Configuration Parameter Optimization
Sensors ( IF 3.9 ) Pub Date : 2022-08-08 , DOI: 10.3390/s22155930
Xu Huang 1 , Hong Zhang 1 , Xiaomeng Zhai 1
Affiliation  

Apache Spark is a popular open-source distributed data processing framework that can efficiently process massive amounts of data. It provides more than 180 configuration parameters for users to manually select the appropriate parameter values according to their own experience. However, due to the large number of parameters and the inherent correlation between them, manual tuning is very tedious. To solve the problem of tuning through personal experience, we designed and implemented a reinforcement-learning-based Spark configuration parameter optimizer. First, we trained a Spark application performance prediction model with deep neural networks, and verified the accuracy and effectiveness of the model from multiple perspectives. Second, in order to improve the search efficiency of better configuration parameters, we improved the Q-learning algorithm, and automatically set start and end states in each iteration of training, which effectively improves the agent’s poor performance in exploring better configuration parameters. Lastly, comparing our proposed configuration with the default configuration as the baseline, experimental results show that the optimized configuration gained an average performance improvement of 47%, 43%, 31%, and 45% for four different types of Spark applications, which indicates that our Spark configuration parameter optimizer could efficiently find the better configuration parameters and improve the performance of various Spark applications.

中文翻译:

一种用于 Spark 配置参数优化的新型强化学习方法

Apache Spark 是一种流行的开源分布式数据处理框架,可以高效地处理海量数据。提供180多个配置参数供用户根据自己的经验手动选择合适的参数值。但是,由于参数数量众多,并且它们之间存在固有的相关性,手动调优非常繁琐。为了解决通过个人经验调优的问题,我们设计并实现了一个基于强化学习的 Spark 配置参数优化器。首先,我们用深度神经网络训练了一个 Spark 应用性能预测模型,并从多个角度验证了模型的准确性和有效性。二、为了提高更好的配置参数的搜索效率,我们改进了Q-learning算法,在每次训练迭代中自动设置开始和结束状态,有效改善了agent在探索更好的配置参数时表现不佳的问题。最后,将我们提出的配置与作为基准的默认配置进行比较,实验结果表明,优化后的配置在四种不同类型的 Spark 应用程序中平均性能提升了 47%、43%、31% 和 45%,这表明我们的 Spark 配置参数优化器可以有效地找到更好的配置参数并提高各种 Spark 应用程序的性能。
更新日期:2022-08-09
down
wechat
bug