On the Replicability and Reproducibility of Deep Learning in Software Engineering,arXiv - CS - Software Engineering

当前位置： X-MOL 学术 › arXiv.cs.SE › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

On the Replicability and Reproducibility of Deep Learning in Software Engineering
arXiv - CS - Software Engineering Pub Date : 2020-06-25 , DOI: arxiv-2006.14244
Chao Liu, Cuiyun Gao, Xin Xia, David Lo, John Grundy, Xiaohu Yang

Deep learning (DL) techniques have gained significant popularity among software engineering (SE) researchers in recent years. This is because they can often solve many SE challenges without enormous manual feature engineering effort and complex domain knowledge. Although many DL studies have reported substantial advantages over other state-of-the-art models on effectiveness, they often ignore two factors: (1) replicability - whether the reported experimental result can be approximately reproduced in high probability with the same DL model and the same data; and (2) reproducibility - whether one reported experimental findings can be reproduced by new experiments with the same experimental protocol and DL model, but different sampled real-world data. Unlike traditional machine learning (ML) models, DL studies commonly overlook these two factors and declare them as minor threats or leave them for future work. This is mainly due to high model complexity with many manually set parameters and the time-consuming optimization process. In this study, we conducted a literature review on 93 DL studies recently published in twenty SE journals or conferences. Our statistics show the urgency of investigating these two factors in SE. Moreover, we re-ran four representative DL models in SE. Experimental results show the importance of replicability and reproducibility, where the reported performance of a DL model could not be replicated for an unstable optimization process. Reproducibility could be substantially compromised if the model training is not convergent, or if performance is sensitive to the size of vocabulary and testing data. It is therefore urgent for the SE community to provide a long-lasting link to a replication package, enhance DL-based solution stability and convergence, and avoid performance sensitivity on different sampled data.

中文翻译：

论深度学习在软件工程中的可复制性和再现性

近年来，深度学习 (DL) 技术在软件工程 (SE) 研究人员中广受欢迎。这是因为他们通常可以解决许多 SE 挑战，而无需大量的手动特征工程工作和复杂的领域知识。尽管许多 DL 研究报告了在有效性方面优于其他最先进模型的显着优势，但它们往往忽略了两个因素：（1）可重复性 - 报告的实验结果是否可以用相同的 DL 模型以高概率近似重现；相同的数据；(2) 可重复性 - 一个报告的实验结果是否可以通过具有相同实验方案和 DL 模型但不同采样的真实世界数据的新实验重现。与传统的机器学习 (ML) 模型不同，DL 研究通常会忽略这两个因素，并将它们声明为次要威胁或将它们留待未来工作。这主要是由于模型复杂度高，手动设置参数多，优化过程耗时。在这项研究中，我们对最近发表在 20 种 SE 期刊或会议上的 93 项 DL 研究进行了文献综述。我们的统计数据显示了在 SE 中调查这两个因素的紧迫性。此外，我们在 SE 中重新运行了四个具有代表性的 DL 模型。实验结果显示了可重复性和再现性的重要性，其中报告的 DL 模型性能无法在不稳定的优化过程中复制。如果模型训练不收敛，或者性能对词汇量和测试数据的大小敏感，则可重复性可能会受到很大影响。

更新日期：2020-06-26

点击分享查看原文

点击收藏

阅读更多本刊最新论文