当前位置: X-MOL 学术IEEE Trans. Softw. Eng. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
A Data Transfer and Relevant Metrics Matching Based Approach for Heterogeneous Defect Prediction
IEEE Transactions on Software Engineering ( IF 6.5 ) Pub Date : 2022-05-10 , DOI: 10.1109/tse.2022.3173678
Pravas Ranjan Bal 1 , Sandeep Kumar 2
Affiliation  

Heterogeneous defect prediction (HDP) is a promising research area in the software defect prediction domain to handle the unavailability of the past homogeneous data. In HDP, the prediction is performed using source dataset in which the independent features (metrics) are entirely different than the independent features of target dataset. One important assumption in machine learning is that independent features of the source and target datasets should be relevant to each other for better prediction accuracy. However, these assumptions do not generally hold in HDP. Further in HDP, the selected source dataset for a given target dataset may be of small size causing insufficient training. To resolve these issues, we have proposed a novel heterogeneous data preprocessing method, namely, Transfer of Data from Target dataset to Source dataset selected using Relevance score (TDTSR), for heterogeneous defect prediction. In the proposed approach, we have used chi-square test to select the relevant metrics between source and target datasets and have performed experiments using proposed approach with various machine learning algorithms. Our proposed method shows an improvement of at least 14% in terms of AUC score in the HDP scenario compared to the existing state of the art models.

中文翻译:

一种基于数据传输和相关指标匹配的异构缺陷预测方法

异构缺陷预测(HDP)是软件缺陷预测领域中一个很有前途的研究领域,用于处理过去同质数据的不可用性。在 HDP 中,使用源数据集进行预测,其中独立特征(度量)与目标数据集的独立特征完全不同。机器学习中的一个重要假设是源数据集和目标数据集的独立特征应该相互关联,以获得更好的预测准确性。然而,这些假设在 HDP 中通常并不成立。此外,在 HDP 中,为给定目标数据集选择的源数据集可能很小,导致训练不足。为了解决这些问题,我们提出了一种新颖的异构数据预处理方法,即 将数据从目标数据集传输到使用相关性评分 (TDTSR) 选择的源数据集,用于异构缺陷预测。在所提出的方法中,我们使用卡方检验来选择源数据集和目标数据集之间的相关指标,并使用所提出的方法和各种机器学习算法进行了实验。与现有的最先进模型相比,我们提出的方法在 HDP 场景中的 AUC 分数方面至少提高了 14%。
更新日期:2022-05-10
down
wechat
bug