当前位置: X-MOL 学术ACM Trans. Softw. Eng. Methodol. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Predicting Performance Anomalies in Software Systems at Run-time
ACM Transactions on Software Engineering and Methodology ( IF 6.6 ) Pub Date : 2021-04-23 , DOI: 10.1145/3440757
Guoliang Zhao 1 , Safwat Hassan 2 , Ying Zou 3 , Derek Truong 4 , Toby Corbin 5
Affiliation  

High performance is a critical factor to achieve and maintain the success of a software system. Performance anomalies represent the performance degradation issues (e.g., slowing down in system response times) of software systems at run-time. Performance anomalies can cause a dramatically negative impact on users’ satisfaction. Prior studies propose different approaches to detect anomalies by analyzing execution logs and resource utilization metrics after the anomalies have happened. However, the prior detection approaches cannot predict the anomalies ahead of time; such limitation causes an inevitable delay in taking corrective actions to prevent performance anomalies from happening. We propose an approach that can predict performance anomalies in software systems and raise anomaly warnings in advance. Our approach uses a Long-Short Term Memory neural network to capture the normal behaviors of a software system. Then, our approach predicts performance anomalies by identifying the early deviations from the captured normal system behaviors. We conduct extensive experiments to evaluate our approach using two real-world software systems (i.e., Elasticsearch and Hadoop). We compare the performance of our approach with two baselines. The first baseline is one state-to-the-art baseline called Unsupervised Behavior Learning. The second baseline predicts performance anomalies by checking if the resource utilization exceeds pre-defined thresholds. Our results show that our approach can predict various performance anomalies with high precision (i.e., 97–100%) and recall (i.e., 80–100%), while the baselines achieve 25–97% precision and 93–100% recall. For a range of performance anomalies, our approach can achieve sufficient lead times that vary from 20 to 1,403 s (i.e., 23.4 min). We also demonstrate the ability of our approach to predict the performance anomalies that are caused by real-world performance bugs. For predicting performance anomalies that are caused by real-world performance bugs, our approach achieves 95–100% precision and 87–100% recall, while the baselines achieve 49–83% precision and 100% recall. The obtained results show that our approach outperforms the existing anomaly prediction approaches and is able to predict performance anomalies in real-world systems.

中文翻译:

在运行时预测软件系统的性能异常

高性能是实现和保持软件系统成功的关键因素。性能异常代表软件系统在运行时的性能下降问题(例如,系统响应时间变慢)。性能异常会对用户的满意度造成极大的负面影响。先前的研究提出了通过在异常发生后分析执行日志和资源利用率指标来检测异常的不同方法。然而,现有的检测方法无法提前预测异常;这种限制导致不可避免地延迟采取纠正措施以防止发生性能异常。我们提出了一种方法,可以预测性能异常在软件系统中并提前发出异常警告。我们的方法使用长短期记忆神经网络来捕捉软件系统的正常行为。然后,我们的方法通过识别与捕获的正常系统行为的早期偏差来预测性能异常。我们使用两个真实世界的软件系统(即 Elasticsearch 和 Hadoop)进行了广泛的实验来评估我们的方法。我们将我们的方法的性能与两个基线进行比较。第一个基线是一个最先进的基线,称为无监督行为学习。第二个基线通过检查资源利用率是否超过预定义的阈值来预测性能异常。我们的结果表明,我们的方法可以以高精度(即 97-100%)和召回率(即 80-100%)预测各种性能异常,而基线达到 25-97% 的精度和 93-100% 的召回率。对于一系列性能异常,我们的方法可以实现从 20 到 1,403 秒(即 23.4 分钟)不等的足够提前期。我们还展示了我们的方法预测由现实世界的性能错误引起的性能异常的能力。为了预测由实际性能错误引起的性能异常,我们的方法实现了 95-100% 的精度和 87-100% 的召回率,而基线实现了 49-83% 的准确率和 100% 的召回率。获得的结果表明,我们的方法优于现有的异常预测方法,并且能够预测现实世界系统中的性能异常。我们的方法可以实现从 20 到 1,403 秒(即 23.4 分钟)的充足交货时间。我们还展示了我们的方法预测由现实世界的性能错误引起的性能异常的能力。为了预测由现实世界的性能错误引起的性能异常,我们的方法实现了 95-100% 的准确率和 87-100% 的召回率,而基线实现了 49-83% 的准确率和 100% 的召回率。获得的结果表明,我们的方法优于现有的异常预测方法,并且能够预测现实世界系统中的性能异常。我们的方法可以实现从 20 到 1,403 秒(即 23.4 分钟)的充足交货时间。我们还展示了我们的方法预测由现实世界的性能错误引起的性能异常的能力。为了预测由现实世界的性能错误引起的性能异常,我们的方法实现了 95-100% 的准确率和 87-100% 的召回率,而基线实现了 49-83% 的准确率和 100% 的召回率。获得的结果表明,我们的方法优于现有的异常预测方法,并且能够预测现实世界系统中的性能异常。我们的方法实现了 95-100% 的准确率和 87-100% 的召回率,而基线实现了 49-83% 的准确率和 100% 的召回率。获得的结果表明,我们的方法优于现有的异常预测方法,并且能够预测现实世界系统中的性能异常。我们的方法实现了 95-100% 的准确率和 87-100% 的召回率,而基线实现了 49-83% 的准确率和 100% 的召回率。获得的结果表明,我们的方法优于现有的异常预测方法,并且能够预测现实世界系统中的性能异常。
更新日期:2021-04-23
down
wechat
bug