On the Variability of Software Engineering Needs for Deep Learning: Stages, Trends, and Application Types,IEEE Transactions on Software Engineering

当前位置： X-MOL 学术 › IEEE Trans. Softw. Eng. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

On the Variability of Software Engineering Needs for Deep Learning: Stages, Trends, and Application Types
IEEE Transactions on Software Engineering ( IF 6.5 ) Pub Date : 2022-03-30 , DOI: 10.1109/tse.2022.3163576
Kai Gao ₁ , Zhixing Wang ₂ , Audris Mockus ₃ , Minghui Zhou ₄

Affiliation

The wide use of Deep learning (DL) has not been followed by the corresponding advances in software engineering (SE) for DL. Research shows that developers writing DL software have specific development stages (i.e., SE4DL stages) and face new DL-specific problems. Despite substantial research, it is unclear how DL developers’ SE needs for DL vary over stages, application types, or if they change over time. To help focus research and development efforts on DL-development challenges, we analyze 92,830 Stack Overflow (SO) questions and 227,756 READMEs of public repositories related to DL. Latent Dirichlet Allocation (LDA) reveals 27 topics for the SO questions where 19 (70.4%) topics mainly relate to a single SE4DL stage, and eight topics span multiple stages. Most questions concern Data Preparation and Model Setup stages. The relative rates of questions for 11 topics have increased, for eight topics decreased over time. Questions for the former 11 topics had a lower percentage of accepting an answer than the remaining questions. LDA on README files reveals 16 distinct application types for the 227k repositories. We apply the LDA model fitted on READMEs to the 92,830 SO questions and find that 27% of the questions are related to the 16 DL application types. The most asked question topic varies across application types, with half primarily relating to the second and third stages. Specifically, developers ask the most questions about topics primarily relating to Data Preparation (2nd) stage for four mature application types such as

${{\sf Image\ Segmentation}}$

, and topics primarily relating to Model Setup (3rd) stage for four application types concerning emerging methods such as

${{\sf Transfer\ Learning}}$

. Based on our findings, we distill several actionable insights for SE4DL research, practice, and education, such as better support for using trained models, application-type specific tools, and teaching materials.

中文翻译：

深度学习软件工程需求的可变性：阶段、趋势和应用类型

深度学习 (DL) 的广泛使用并没有伴随着 DL 的软件工程 (SE) 的相应进展。研究表明，编写 DL 软件的开发人员具有特定的开发阶段（即 SE4DL 阶段）并面临新的 DL 特定问题。尽管进行了大量研究，但尚不清楚 DL 开发人员对 DL 的 SE 需求如何随阶段、应用程序类型变化，或者是否随时间变化。为了帮助集中研究和开发工作来应对 DL 开发挑战，我们分析了 92,830 个 Stack Overflow (SO) 问题和 227,756 个与 DL 相关的公共存储库的 README。潜在狄利克雷分配 (LDA) 揭示了 SO 问题的 27 个主题，其中 19 个 (70.4%) 个主题主要与单个 SE4DL 阶段相关，8 个主题跨越多个阶段。大多数问题涉及数据准备和模型设置阶段。随着时间的推移，11 个主题的问题相对比率有所增加，8 个主题的相对比率有所下降。前 11 个主题的问题接受答案的比例低于其余问题。自述文件中的 LDA 揭示了 227k 存储库的 16 种不同的应用程序类型。我们将 README 上拟合的 LDA 模型应用于 92,830 个 SO 问题，发现 27% 的问题与 16 种 DL 应用类型相关。最常见的问题主题因应用程序类型而异，其中一半主要与第二和第三阶段有关。具体来说，开发人员提出的最多问题主要涉及四种成熟应用程序类型的数据准备（第二）阶段（例如 ${{\sf Image\ Segmentation}}$ ），以及主要与四种成熟应用程序类型的模型设置（第三）阶段相关的主题涉及新兴方法的应用程序类型，例如 ${{\sf Transfer\ Learning}}$ 。根据我们的发现，我们为 SE4DL 研究、实践和教育提炼出一些可行的见解，例如更好地支持使用训练有素的模型、特定于应用程序类型的工具和教材。

更新日期：2022-03-30

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南11