当前位置: X-MOL 学术arXiv.cs.SE › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Cross-Dataset Design Discussion Mining
arXiv - CS - Software Engineering Pub Date : 2020-01-06 , DOI: arxiv-2001.01424
Alvi Mahadi and Karan Tongay and Neil A. Ernst

Being able to identify software discussions that are primarily about design, which we call design mining, can improve documentation and maintenance of software systems. Existing design mining approaches have good classification performance using natural language processing (NLP) techniques, but the conclusion stability of these approaches is generally poor. A classifier trained on a given dataset of software projects has so far not worked well on different artifacts or different datasets. In this study, we replicate and synthesize these earlier results in a meta-analysis. We then apply recent work in transfer learning for NLP to the problem of design mining. However, for our datasets, these deep transfer learning classifiers perform no better than less complex classifiers. We conclude by discussing some reasons behind the transfer learning approach to design mining.

中文翻译:

跨数据集设计讨论挖掘

能够识别主要关于设计的软件讨论,我们称之为设计挖掘,可以改进软件系统的文档和维护。现有的设计挖掘方法使用自然语言处理(NLP)技术具有良好的分类性能,但这些方法的结论稳定性普遍较差。迄今为止,在给定的软件项目数据集上训练的分类器在不同的工件或不同的数据集上效果不佳。在本研究中,我们在荟萃分析中复制并综合了这些早期结果。然后,我们将最近在 NLP 迁移学习方面的工作应用于设计挖掘问题。然而,对于我们的数据集,这些深度迁移学习分类器的性能并不比不太复杂的分类器好。
更新日期:2020-05-27
down
wechat
bug