Predicting Hadoop misconfigurations using machine learning,Software: Practice and Experience

当前位置： X-MOL 学术 › Softw. Pract. Exp. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Predicting Hadoop misconfigurations using machine learning
Software: Practice and Experience ( IF 2.6 ) Pub Date : 2020-01-24 , DOI: 10.1002/spe.2790
Andrew Robert ₁ , Apaar Gupta ₁ , Vinayak Shenoy ₁ , Dinkar Sitaram ₁ , Subramaniam Kalambur ₁

Affiliation

Distributed applications are popular for heavy workloads where the resources of a single machine are not sufficient. These distributed applications come with many parameters to tune so that cluster resources can be effectively utilized. However, any misconfiguration of the available parameters may result in suboptimal performance of one or more machines in the cluster. These events may go unnoticed or can result in crashes. This problem of misconfigured parameters has no straightforward solution due to the variety of parameters and vastly different workloads being processed. In this article, we propose a methodology for machine learning‐based detection of misconfigurations. We collect data mined from system resource utilization, Hadoop logs, and job‐level metrics to train a model using decision tree and support vector machine. The models are used to identify whether a set of configuration parameters could result in a crash or a slowdown for a specific workload. The approach explained in this article can be extended to other distributed big data applications, such as Spark, Hive, Pig, and so on.

中文翻译：

使用机器学习预测 Hadoop 错误配置

分布式应用程序在单台机器资源不足的繁重工作负载中很受欢迎。这些分布式应用程序带有许多要调整的参数，以便可以有效地利用集群资源。但是，可用参数的任何错误配置都可能导致集群中一台或多台机器的性能欠佳。这些事件可能会被忽视或可能导致崩溃。由于参数的多样性和所处理的工作负载差异很大，这个参数配置错误的问题没有直接的解决方案。在本文中，我们提出了一种基于机器学习的错误配置检测方法。我们收集从系统资源利用率、Hadoop 日志和作业级指标中挖掘的数据，以使用决策树和支持向量机训练模型。这些模型用于识别一组配置参数是否会导致特定工作负载的崩溃或减速。本文介绍的方法可以扩展到其他分布式大数据应用程序，例如 Spark、Hive、Pig 等。

更新日期：2020-01-24

点击分享查看原文

点击收藏

阅读更多本刊最新论文