Feature selection and embedding based cross project framework for identifying crashing fault residence,Information and Software Technology

当前位置： X-MOL 学术 › Inf. Softw. Technol. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Feature selection and embedding based cross project framework for identifying crashing fault residence
Information and Software Technology ( IF 3.8 ) Pub Date : 2020-10-15 , DOI: 10.1016/j.infsof.2020.106452
Zhou Xu , Tao Zhang , Jacky Keung , Meng Yan , Xiapu Luo , Xiaohong Zhang , Ling Xu , Yutian Tang

Context: The automatically produced crash reports are able to analyze the root of fault causing the crash (crashing fault for short) which is a critical activity for software quality assurance.

Objective: Correctly predicting the existence of crashing fault residence in stack traces of crash report can speed up program debugging process and optimize debugging efforts. Existing work focused on the collected label information from bug-fixing logs, and the extracted features of crash instances from stack traces and source code for Identification of Crashing Fault Residence (ICFR) of newly-submitted crashes. This work develops a novel cross project ICFR framework to address the data scarcity problem by using labeled crash data of other project for the ICFR task of the project at hand. This framework removes irrelevant features, reduces distribution differences, and eases the class imbalance issue of cross project data since these factors may negatively impact the ICFR performance.

Method: The proposed framework, called FSE, combines Feature Selection and feature Embedding techniques. The FSE framework first uses an information gain ratio based feature ranking method to select a relevant feature subset for cross project data, and then employs a state-of-the-art Weighted Balanced Distribution Adaptation (WBDA) method to map features of cross project data into a common space. WBDA considers both marginal and conditional distributions as well as their weights to reduce data distribution discrepancies. Besides, WBDA balances the class proportion of each project data to alleviate the class imbalance issue.

Results: We conduct experiments on 7 projects to evaluate the performance of our FSE framework. The results show that FSE outperforms 25 methods under comparison.

Conclusion: This work proposes a cross project learning framework for ICFR, which uses feature selection and embedding to remove irrelevant features and reduce distribution differences, respectively. The results illustrate the performance superiority of our FSE framework.

中文翻译：

基于特征选择和嵌入的跨项目框架，用于识别崩溃故障驻留

背景信息：自动生成的崩溃报告能够分析导致崩溃的故障根源（简称崩溃故障），这对于保证软件质量至关重要。

目的：正确预测崩溃报告堆栈跟踪中崩溃故障驻留的存在，可以加快程序调试过程并优化调试工作。现有的工作集中于从错误固定日志的收集标签信息，以及从栈跟踪和源代码的碰撞情况下的所提取的特征对于我的dentification Ç rashing ˚F行凶ř esidence（ICFR）的新提交的崩溃。这项工作开发了一个新颖的跨项目ICFR框架，通过将其他项目的已标记崩溃数据用于手头项目的ICFR任务来解决数据短缺问题。该框架消除了不相关的功能，减少了分布差异，并缓解了跨项目数据的类不平衡问题，因为这些因素可能会对ICFR的性能产生负面影响。

方法：拟议的框架，称为FSE，结合˚F eature小号选举和功能ē mbedding技术。在FSE框架第一使用基于信息增益比特征排名的方法来选择用于交项目数据相关的特征子集，然后采用一个国家的最先进的W¯¯ eighted乙宠辱不惊d istribution甲daptation（WBDA）的方法将跨项目数据的特征映射到公共空间。WBDA会考虑边际和条件分布及其权重，以减少数据分布的差异。此外，WBDA平衡了每个项目数据的类比例，以缓解类不平衡问题。

结果：我们在7个项目上进行了实验，以评估FSE框架的性能。结果表明，相比之下，FSE优于25种方法。

结论：这项工作为ICFR提出了一个跨项目学习框架，该框架使用特征选择和嵌入分别去除不相关的特征和减少分布差异。结果说明了我们FSE框架的性能优势。

更新日期：2020-11-19

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南11