当前位置: X-MOL 学术J. Syst. Softw. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Imbalanced metric learning for crashing fault residence prediction
Journal of Systems and Software ( IF 3.7 ) Pub Date : 2020-12-01 , DOI: 10.1016/j.jss.2020.110763
Zhou Xu , Kunsong Zhao , Meng Yan , Peipei Yuan , Ling Xu , Yan Lei , Xiaohong Zhang

Abstract As the software crash usually does great harm, locating the fault causing the crash (i.e., the crashing fault) has always been a hot research topic. As the stack trace in the crash reports usually contains abundant information related the crash, it is helpful to find the root cause of the crash. Recently, researchers extracted features of the crash, then constructed the classification model on the features to predict whether the crashing fault resides in the stack trace. This process can accelerate the debugging process and save debugging efforts. In this work, we apply a state-of-the-art metric learning method called IML to crash data for crashing fault residence prediction. This method uses Mahalanobis distance based metric learning to learn high-quality feature representation by reducing the distance between crash instances with the same label and increasing the distance between crash instances with different labels. In addition, this method designs a new loss function that includes four types of losses with different weights to cope with the class imbalanced issue of crash data. The experiments on seven open source software projects show that our IML method performs significantly better than nine sampling based and five ensemble based imbalanced learning methods in terms of three performance indicators.

中文翻译:

用于崩溃故障驻留预测的不平衡度量学习

摘要 由于软件崩溃通常会造成很大的危害,定位导致崩溃的故障(即崩溃故障)一直是一个热门的研究课题。由于崩溃报告中的堆栈跟踪通常包含与崩溃相关的丰富信息,因此有助于找到崩溃的根本原因。最近,研究人员提取了崩溃的特征,然后在特征上构建了分类模型来预测崩溃故障是否存在于堆栈跟踪中。此过程可以加快调试过程并节省调试工作量。在这项工作中,我们将一种称为 IML 的最先进的度量学习方法应用于崩溃数据,以进行崩溃故障驻留预测。该方法使用基于马氏距离的度量学习,通过减少具有相同标签的碰撞实例之间的距离并增加具有不同标签的碰撞实例之间的距离来学习高质量的特征表示。此外,该方法设计了一种新的损失函数,包括四种不同权重的损失,以应对碰撞数据的类别不平衡问题。在七个开源软件项目上的实验表明,就三个性能指标而言,我们的 IML 方法明显优于基于九个采样和五个基于集成的不平衡学习方法。该方法设计了一个新的损失函数,包括四种不同权重的损失,以应对碰撞数据的类别不平衡问题。在七个开源软件项目上的实验表明,就三个性能指标而言,我们的 IML 方法明显优于基于九个采样和五个基于集成的不平衡学习方法。该方法设计了一个新的损失函数,包括四种不同权重的损失,以应对碰撞数据的类别不平衡问题。在七个开源软件项目上的实验表明,就三个性能指标而言,我们的 IML 方法明显优于基于九个采样和五个基于集成的不平衡学习方法。
更新日期:2020-12-01
down
wechat
bug