OneStopTuner: An End to End Architecture for JVM Tuning of Spark Applications,arXiv - CS - Distributed, Parallel, and Cluster Computing

当前位置： X-MOL 学术 › arXiv.cs.DC › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

OneStopTuner: An End to End Architecture for JVM Tuning of Spark Applications
arXiv - CS - Distributed, Parallel, and Cluster Computing Pub Date : 2020-09-07 , DOI: arxiv-2009.06374
Venktesh V, Pooja B Bindal, Devesh Singhal, A V Subramanyam, Vivek Kumar

Java is the backbone of widely used big data frameworks, such as Apache Spark, due to its productivity, portability from JVM-based execution, and support for a rich set of libraries. However, the performance of these applications can widely vary depending on the runtime flags chosen out of all existing JVM flags. Manually tuning these flags is both cumbersome and error-prone. Automated tuning approaches can ease the task, but current solutions either require considerable processing time or target a subset of flags to avoid time and space requirements. In this paper, we present OneStopTuner, a Machine Learning based novel framework for autotuning JVM flags. OneStopTuner controls the amount of data generation by leveraging batch mode active learning to characterize the user application. Based on the user-selected optimization metric, OneStopTuner then discards the irrelevant JVM flags by applying feature selection algorithms on the generated data. Finally, it employs sample efficient methods such as Bayesian optimization and regression guided Bayesian optimization on the shortlisted JVM flags to find the optimal values for the chosen set of flags. We evaluated OneStopTuner on widely used Spark benchmarks and compare its performance with the traditional simulated annealing based autotuning approach. We demonstrate that for optimizing execution time, the flags chosen by OneStopTuner provides a speedup of up to 1.35x over default Spark execution, as compared to 1.15x speedup by using the flag configurations proposed by simulated annealing. OneStopTuner was able to reduce the number of executions for data-generation by 70% and was able to suggest the optimal flag configuration 2.4x faster than the standard simulated annealing based approach, excluding the time for data-generation.

中文翻译：

OneStopTuner：用于 JVM 调优 Spark 应用程序的端到端架构

Java 是广泛使用的大数据框架（例如 Apache Spark）的支柱，因为它的生产力、基于 JVM 的执行的可移植性以及对丰富的库集的支持。但是，这些应用程序的性能可能会有很大差异，具体取决于从所有现有 JVM 标志中选择的运行时标志。手动调整这些标志既麻烦又容易出错。自动调整方法可以简化任务，但当前的解决方案要么需要大量的处理时间，要么以标记的子集为目标，以避免时间和空间要求。在本文中，我们介绍了 OneStopTuner，这是一种基于机器学习的新颖框架，用于自动调整 JVM 标志。OneStopTuner 通过利用批处理模式主动学习来表征用户应用程序来控制数据生成量。基于用户选择的优化指标，OneStopTuner 然后通过对生成的数据应用特征选择算法来丢弃不相关的 JVM 标志。最后，它对入围的 JVM 标志采用样本高效的方法，例如贝叶斯优化和回归引导的贝叶斯优化，以找到所选标志集的最佳值。我们在广泛使用的 Spark 基准测试中评估了 OneStopTuner，并将其性能与传统的基于模拟退火的自动调整方法进行了比较。我们证明，为了优化执行时间，OneStopTuner 选择的标志比默认 Spark 执行提供高达 1.35 倍的加速，而使用模拟退火提出的标志配置则可提供 1.15 倍的加速。OneStopTuner 能够将数据生成的执行次数减少 70%，并能够建议最佳标志配置 2。

更新日期：2020-09-15

点击分享查看原文

点击收藏

阅读更多本刊最新论文

全部期刊列表>>