Better Data Labelling with EMBLEM (and how that Impacts Defect Prediction),arXiv - CS - Software Engineering

当前位置： X-MOL 学术 › arXiv.cs.SE › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Better Data Labelling with EMBLEM (and how that Impacts Defect Prediction)
arXiv - CS - Software Engineering Pub Date : 2019-05-05 , DOI: arxiv-1905.01719
Huy Tu and Zhe Yu and Tim Menzies

Standard automatic methods for recognizing problematic development commits can be greatly improved via the incremental application of human+artificial expertise. In this approach, called EMBLEM, an AI tool first explore the software development process to label commits that are most problematic. Humans then apply their expertise to check those labels (perhaps resulting in the AI updating the support vectors within their SVM learner). We recommend this human+AI partnership, for several reasons. When a new domain is encountered, EMBLEM can learn better ways to label which comments refer to real problems. Also, in studies with 9 open source software projects, labelling via EMBLEM's incremental application of human+AI is at least an order of magnitude cheaper than existing methods ($\approx$ eight times). Further, EMBLEM is very effective. For the data sets explored here, EMBLEM better labelling methods significantly improved $P_{opt}20$ and G-scores performance in nearly all the projects studied here.

中文翻译：

使用 EMBLEM 进行更好的数据标记（以及它如何影响缺陷预测）

通过人类+人工专业知识的增量应用，可以大大改进用于识别有问题的开发提交的标准自动方法。在这种称为 EMBLEM 的方法中，一种 AI 工具首先探索软件开发过程以标记最有问题的提交。然后人类应用他们的专业知识来检查这些标签（可能导致 AI 更新其 SVM 学习器中的支持向量）。我们推荐这种人与人工智能的合作，有几个原因。当遇到一个新领域时，EMBLEM 可以学习更好的方法来标记哪些评论涉及实际问题。此外，在对 9 个开源软件项目的研究中，通过 EMBLEM 的人类+AI 增量应用进行标记至少比现有方法便宜一个数量级（约 8 倍）。此外，EMBLEM 非常有效。

更新日期：2020-04-08

点击分享查看原文

点击收藏

阅读更多本刊最新论文