Conclusion Stability for Natural Language Based Mining of Design Discussions,arXiv - CS - Software Engineering

当前位置： X-MOL 学术 › arXiv.cs.SE › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Conclusion Stability for Natural Language Based Mining of Design Discussions
arXiv - CS - Software Engineering Pub Date : 2021-06-17 , DOI: arxiv-2106.09844
Alvi Mahadi, Neil A. Ernst, Karan Tongay

Developer discussions range from in-person hallway chats to comment chains on bug reports. Being able to identify discussions that touch on software design would be helpful in documentation and refactoring software. Design mining is the application of machine learning techniques to correctly label a given discussion artifact, such as a pull request, as pertaining (or not) to design. In this paper we demonstrate a simple example of how design mining works. We then show how conclusion stability is poor on different artifact types and different projects. We show two techniques -- augmentation and context specificity -- that greatly improve the conclusion stability and cross-project relevance of design mining. Our new approach achieves AUC of 0.88 on within dataset classification and 0.80 on the cross-dataset classification task.

中文翻译：

基于自然语言的设计讨论挖掘的结论稳定性

开发人员讨论的范围从面对面的走廊聊天到对错误报告的评论链。能够识别涉及软件设计的讨论将有助于文档和重构软件。设计挖掘是应用机器学习技术来正确标记给定的讨论工件，例如拉取请求，与设计相关（或不相关）。在本文中，我们展示了一个关于设计挖掘如何工作的简单示例。然后，我们展示了不同工件类型和不同项目的结论稳定性如何较差。我们展示了两种技术——增强和上下文特异性——它们极大地提高了设计挖掘的结论稳定性和跨项目相关性。我们的新方法在数据集分类任务上实现了 0.88 的 AUC，在跨数据集分类任务上实现了 0.80。

更新日期：2021-06-25

点击分享查看原文

点击收藏

阅读更多本刊最新论文