CAPS: a supervised technique for classifying Stack Overflow posts concerning API issues,Empirical Software Engineering

当前位置： X-MOL 学术 › Empir. Software Eng. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

CAPS: a supervised technique for classifying Stack Overflow posts concerning API issues
Empirical Software Engineering ( IF 4.1 ) Pub Date : 2019-07-19 , DOI: 10.1007/s10664-019-09743-4
Md Ahasanuzzaman , Muhammad Asaduzzaman , Chanchal K. Roy , Kevin A. Schneider

The design and maintenance of APIs (Application Programming Interfaces) are complex tasks due to the constantly changing requirements of their users. Despite the efforts of their designers, APIs may suffer from a number of issues (such as incomplete or erroneous documentation, poor performance, and backward incompatibility). To maintain a healthy client base, API designers must learn these issues to fix them. Question answering sites, such as Stack Overflow (SO), have become a popular place for discussing API issues. These posts about API issues are invaluable to API designers, not only because they can help to learn more about the problem but also because they can facilitate learning the requirements of API users. However, the unstructured nature of posts and the abundance of non-issue posts make the task of detecting SO posts concerning API issues difficult and challenging. In this paper, we first develop a supervised learning approach using a Conditional Random Field (CRF), a statistical modeling method, to identify API issue-related sentences. We use the above information together with different features collected from posts, the experience of users, readability metrics and centrality measures of collaboration network to build a technique, called CAPS , that can classify SO posts concerning API issues. In total, we consider 34 features along eight different dimensions. Evaluation of CAPS using carefully curated SO posts on three popular API types reveals that the technique outperforms all three baseline approaches we consider in this study. We then conduct studies to find important features and also evaluate the performance of the CRF-based technique for classifying issue sentences. Comparison with two other baseline approaches shows that the technique has high potential. We also test the generalizability of CAPS results, evaluate the effectiveness of different classifiers, and identify the impact of different feature sets.

中文翻译：

CAPS：一种对涉及 API 问题的 Stack Overflow 帖子进行分类的监督技术

由于用户的需求不断变化，API（应用程序编程接口）的设计和维护是一项复杂的任务。尽管他们的设计者付出了努力，但 API 可能会遇到许多问题（例如不完整或错误的文档、性能不佳和向后不兼容）。为了保持健康的客户群，API 设计人员必须了解这些问题以解决这些问题。问答网站，例如 Stack Overflow (SO)，已成为讨论 API 问题的热门场所。这些关于 API 问题的帖子对于 API 设计者来说是无价的，不仅因为它们可以帮助了解更多关于问题的信息，还因为它们可以促进 API 用户的需求的学习。然而，帖子的非结构化性质和大量非问题帖子使得检测有关 API 问题的 SO 帖子的任务变得困难和具有挑战性。在本文中，我们首先开发了一种使用条件随机场 (CRF)（一种统计建模方法）的监督学习方法，以识别与 API 问题相关的句子。我们使用上述信息以及从帖子中收集的不同特征、用户体验、可读性指标和协作网络的中心性度量来构建一种称为 CAPS 的技术，该技术可以对有关 API 问题的 SO 帖子进行分类。我们总共考虑了八个不同维度的 34 个特征。在三种流行的 API 类型上使用精心策划的 SO 帖子对 CAPS 进行评估表明，该技术优于我们在本研究中考虑的所有三种基线方法。然后我们进行研究以发现重要特征，并评估基于 CRF 的技术对问题句子进行分类的性能。与其他两种基线方法的比较表明该技术具有很高的潜力。我们还测试了 CAPS 结果的普遍性，评估了不同分类器的有效性，并确定了不同特征集的影响。

更新日期：2019-07-19

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南

全部期刊列表>>