Understanding Software-2.0,ACM Transactions on Software Engineering and Methodology

当前位置： X-MOL 学术 › ACM Trans. Softw. Eng. Methodol. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Understanding Software-2.0
ACM Transactions on Software Engineering and Methodology ( IF 4.4 ) Pub Date : 2021-07-23 , DOI: 10.1145/3453478
Malinda Dilhara ₁ , Ameya Ketkar ₁ , Danny Dig ₁

Affiliation

Enabled by a rich ecosystem of Machine Learning (ML) libraries, programming using learned models , i.e., Software-2.0 , has gained substantial adoption. However, we do not know what challenges developers encounter when they use ML libraries. With this knowledge gap, researchers miss opportunities to contribute to new research directions, tool builders do not invest resources where automation is most needed, library designers cannot make informed decisions when releasing ML library versions, and developers fail to use common practices when using ML libraries. We present the first large-scale quantitative and qualitative empirical study to shed light on how developers in Software-2.0 use ML libraries, and how this evolution affects their code. Particularly, using static analysis we perform a longitudinal study of 3,340 top-rated open-source projects with 46,110 contributors. To further understand the challenges of ML library evolution, we survey 109 developers who introduce and evolve ML libraries. Using this rich dataset we reveal several novel findings. Among others, we found an increasing trend of using ML libraries: The ratio of new Python projects that use ML libraries increased from 2% in 2013 to 50% in 2018. We identify several usage patterns including the following: (i) 36% of the projects use multiple ML libraries to implement various stages of the ML workflows, (ii) developers update ML libraries more often than the traditional libraries , (iii) strict upgrades are the most popular for ML libraries among other update kinds, (iv) ML library updates often result in cascading library updates, and (v) ML libraries are often downgraded (22.04% of cases). We also observed unique challenges when evolving and maintaining Software-2.0 such as (i) binary incompatibility of trained ML models and (ii) benchmarking ML models. Finally, we present actionable implications of our findings for researchers, tool builders, developers, educators, library vendors, and hardware vendors.

中文翻译：

了解软件-2.0

由丰富的机器学习 (ML) 库生态系统支持，使用学习模型， IE，软件-2.0, 已获得大量采用。但是，我们不知道开发人员在使用 ML 库时会遇到哪些挑战。由于存在这种知识差距，研究人员错过了为新的研究方向做出贡献的机会，工具构建者不会在最需要自动化的地方投入资源，库设计者在发布 ML 库版本时无法做出明智的决定，开发人员在使用 ML 库时无法使用通用实践. 我们提出了第一个大规模的定量和定性实证研究，以阐明开发人员如何在软件-2.0使用 ML 库，以及这种演变如何影响他们的代码。特别是，我们使用静态分析对 3,340 个顶级开源项目进行了纵向研究，共有 46,110 名贡献者。为了进一步了解 ML 库发展的挑战，我们调查了 109 位引入和发展 ML 库的开发人员。使用这个丰富的数据集，我们揭示了几个新发现。其中，我们发现使用 ML 库的趋势呈上升趋势：使用 ML 库的新 Python 项目的比例从 2013 年的 2% 增加到 2018 年的 50%。我们确定了几种使用模式，包括：(i) 36%这些项目使用多个 ML 库来实现 ML 工作流的各个阶段，(ii) 开发人员更新 ML 库的频率高于传统图书馆, (iii) 严格升级是 ML 库在其他更新类型中最受欢迎的，(iv) ML 库更新通常会导致级联库更新，以及 (v) ML 库通常会降级（22.04% 的案例）。我们还观察到在发展和维护过程中面临的独特挑战软件-2.0例如 (i) 训练的 ML 模型的二进制不兼容和 (ii) 对 ML 模型进行基准测试。最后，我们向研究人员、工具制造商、开发人员、教育工作者、图书馆供应商和硬件供应商展示了我们的研究结果的可操作意义。

更新日期：2021-07-23

点击分享查看原文

点击收藏

公开下载

阅读更多本刊最新论文本刊介绍/投稿指南

全部期刊列表>>