Studying the Relationship Between the Usage of APIs Discussed in the Crowd and Post-Release Defects
Introduction
Today software development is heavily based on libraries, frameworks and their offered APIs. However, these APIs may introduce various common defects in software systems. The causes of these defects may be the lack of proper API documentation (Souza et al., 2019), complexity, and poor structure of the API which leads to misunderstandings (Robillard and Deline, 2011, Campos et al., 2016a), backward compatibility issues (Wang et al., 2015), API correctness (e.g., unexpected behavior of the API)(Wang et al., 2015), and the change-proneness of API (Linares-Vásquez et al., 2014). Usually, such defects occur rapidly independent of the application domain (Campos et al., 2016a).
Upon encountering such errors, defects, and even conceptual questions, developers may ask for help by explaining the issue and sometimes attaching their code in Q&A websites such as Stack Overflow (Uddin and Khomh, 2019). Usually, they find their questions answered very quickly (with a median answer time of 11 min) (Mamykina et al., 2011). Both questions and answers may be validated and rated by other developers through voting and comments. Users who posted up-voted questions or answers, receive reputation scores which motivates individuals to contribute (Wang et al., 2015). Thus, Q&A websites such as Stack Overflow have become an inevitable source for developers to find solutions to their questions and issues (Wang et al., 2015).
At the time of writing this paper, more than 17.8 million questions, 27.2 million answers and 70 million comments have been submitted to Stack Overflow. According to the Stack Overflow developers’ survey, each month, about 50 million people visit the website to learn, share, and build their careers (Anon, 2018). Consequently, the information available on this website constitute an enormous crowd knowledge about common errors, defects, and concepts trusted by millions of developers (de Souza et al., 2014, Mao et al., 2017).
The knowledge obtainable from Stack Overflow covers a wide range of aspects such as security and performance issues (Mao et al., 2017), programming styles (Barua et al., 2014), and API usage obstacles (Wang and Godfrey, 2013). Souza et al. (2019) state that many of the posts on Stack Overflow are primarily about API usage challenges. Moreover, API-related issues inferred from mining Q&A websites, hold particular promise as they contain discussions of the real-world issues encountered by millions of developers (Wang et al., 2015). For example, the method cos(double angle) (in the Java programming language) which is offered by the class java.lang.Math has made a large number of developers confused. This method returns the trigonometric cosine of an angle and the only argument angle should be given in radians. However, many developers do not comply with this requirement. This misunderstanding yields plenty of highly up-voted questions related to this method.2 Additionally, focusing on more challenging APIs may prevent developers from concentrating on the application itself and new errors related to the application logic may take place.
Prior studies have shown the relation between API changes and the quantity of questions submitted to Stack Overflow (Linares-Vásquez et al., 2014), and mined developers’ obstacles (Wang and Godfrey, 2013) and opinions (Uddin and Khomh, 2019) on APIs according to this website. Nevertheless, none of the prior research has focused on analyzing the effect of this knowledge on defect prediction models. We conjecture that if we extract the knowledge about APIs from Stack Overflow, we can make use of this knowledge in better explaining and predicting software defects.
In this paper, we investigate the relationship between the usage of APIs discussed in the crowd and software quality. To this end, we define the concept of challenge of an API, i.e., how much an API is discussed in high quality discussions on Stack Overflow. To better study the quality of Stack Overflow discussions, we statistically investigate Stack Overflow quality descriptors mentioned in prior studies (e.g., up votes, view count, favorite count, questioner reputation, and etc.), and employ Explanatory Factor Analysis (EFA) (Fabrigar et al., 1999) to identify the underlying relationship between these quality descriptors. Using EFA, we found out that three underlying factors can explain the interrelationships among all quality descriptors. Furthermore, using the concept of the challenge of an API, we propose a set of metrics that are based on crowd knowledge to study software quality. We investigate how our proposed metrics can help in explaining software defects, that is, whether adding our metrics to the models built with traditional metrics, increases the proportion of variation that the prediction model accounts for. We also investigate whether our metrics can improve the predictive power of the baseline models.
To measure the code quality, we employ post-release defects since it is widely used by prior studies and researches in the software quality area (Shang et al., 2015, Shihab et al., 2012). Post-release defects are the defects found up to six months after the release of a given version (de Pádua and Shang, 2018). We perform our detailed case study over 17 million discussions of Stack Overflow on five open source projects including Spring, Elastic Search, Jenkins, K-8 Mail client, and OwnCloud client with a focus on the following research questions:
RQ1: Are source code files using more challenging APIs more likely to be defect-prone?
We find positive correlations between crowd-related metrics and post-release defects. Our results show that in 4 out of 10 releases, there exist at least one crowd-related metric which gained a higher correlation with post-release defects compared to pre-release defects which is shown to has the highest correlation to post-release defects among traditional metrics (Moser et al., 2008). Given a version, pre-release defects are the defects found up to six months before the release of that version (Chen et al., 2017).
RQ2: Can crowd knowledge help in explaining post-release defects?
We find out that our crowd-related metrics can provide additional statistically significant explanatory power for software quality over traditional baseline metrics. More specifically, we achieve improvement when we add our metrics to the models based on traditional metrics. Further, we found out our metrics have a positive effect on prediction models.
RQ3: Can crowd knowledge help in predicting post-release defects?
When our crowd-related metrics are added to the model based on traditional metrics, the predictive power of the model increases by in terms of F1 measure. More specifically, we find out that our metrics provide a better improvement in the projects that use more external APIs.
Our findings could be leveraged by software developers to allocate more reviewing and testing efforts on source codes using more challenging APIs to prevent further defects. However, this does not imply that they should avoid using more challenging APIs. Instead, we just complement prior research on identifying high risk source codes to optimize the process of testing and reviewing.
To the extent of our knowledge, this paper is the first attempt to establish an empirical link between the crowd knowledge obtained from Q&A websites and post-release defects.
In summary, the contributions of this paper include:
- •
We propose new metrics based on the crowd knowledge which can be used to better explain and predict software defects.
- •
We perform an empirical study and quantify the statistical relation between our metrics and software quality in terms of post-release defects.
The rest of the paper is organized as follows. Section 2 comes with a few motivating examples. Section 3 describes how we model the crowd knowledge. Section 4 covers how our study is setup. Section 5 presents the results of our case study. Section 6 discusses overall points about our study. Section 7 mentions the threats to the validity of our findings. Section 8 discusses related prior research to this work. Finally, Section 9 concludes this paper and provides future research directions.
Section snippets
Motivating examples
This section presents a few examples to motivate investigating the relation between using more challenging APIs discussed in the crowd and the source code quality.
In revision 1e7a75c042 of the OwnCloud Android client3 the developer uses the WeakReference class, which provides a reference that does not protect a referenced object from collection by the Java garbage collector. This revision introduces a bug, i.e., misusing WeakReference yielded some build time
Modeling the crowd knowledge
In this section, we describe how we model the crowd knowledge available on Stack Overflow in order to propose crowd-related metrics.
The high level process of calculating crowd-related metrics is depicted in Fig. 1. Our approach is based on APIs discussed in the crowd. Thus, as the first step, we parse the heterogeneous data of Stack Overflow (step 1) to extract the code elements and then APIs from discussions. Next, we identify APIs, which are the main concerns of the discussions, (step 2) by
Case study design
In this section, we introduce the systems that we employ as our case study and other data processing steps.
Case study results
In this section we present and discuss the results of our case study. First we statistically analyze the quality descriptors listed in Table 1. Then, for each research question provided in Section 1, we discuss the underlying motivation, our approach toward answering that question, and finally the obtained experimental results. At last, we conduct a qualitative analysis about the challenge of APIs.
Discussion
In this section we discuss the overall points about our findings. Today software development is heavily based on external packages and libraries. Although, our results show a relation between using more challenging APIs and defects, developers cannot and should not avoid using APIs with high challenge, because, leveraging libraries makes the code base smaller which increases the maintainability. Further, developers do not worry about further development and improvement of the external libraries
Threats to validity
In this section we discuss the threats to the validity of our study.
External Validity. Our study is based on five popular open source Java projects publicly available on GitHub. However, the results of our study may not necessarily generalize to all software systems and programming languages. Further, the H-AST we used for parsing discussions which is offered by Ponzanelli et al. (2015) is just available for discussions of Java language. For other languages, we need to implement the island
Related work
In this section, we describe related work with respect to the use of crowd knowledge in software engineering and defect prediction.
Conclusions and future work
Q&A websites such as Stack Overflow has turned to be an unavoidable tool for developers to ask and find solutions to their questions, issues and errors. However, the effect of the crowd knowledge obtainable from this huge source of information on explaining defects has never been empirically studied before. In this paper, we attempted to model this crowd knowledge by proposing a set of metrics and statistically investigated the relation between these crowd-related metrics and software quality.
CRediT authorship contribution statement
Hamed Tahmooresi: Conceptualization, Methodology, Software, Validation, Formal analysis, Writing - original draft. Abbas Heydarnoori: Project administration, Supervision, Methodology, Writing - review & editing. Reza Nadri: Software, Validation, Investigation, Writing - review & editing.
Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Hamed Tahmooresi is a Ph.D. student at the Sharif University of Technology. His research interests include software engineering, software architecture and design, and mining software repositories. Contact him at [email protected]
References (86)
- et al.
Topic-based software defect explanation
J. Syst. Softw.
(2017) - et al.
Predictive habitat distribution models in ecology
Ecol. Model.
(2000) - et al.
A survey of the use of crowdsourcing in software engineering
J. Syst. Softw.
(2017) - et al.
Bootstrapping cookbooks for APIs from crowd knowledge on Stack Overflow
Inf. Softw. Technol.
(2019) - et al.
Detecting API usage obstacles: A study of ios and android developer questions
- et al.
Island grammar-based parsing using gll and tom
- et al.
Classifying Stack Overflow posts on API issues
An example of software system debugging
- et al.
Why, when, and what: Analyzing Stack Overflow questions by topic, type, and code
Stack Overflow developer survey
(2018)
Stack exchange data explorer
Mining questions asked by web developers
What are developers talking about? An analysis of topics and trends in Stack Overflow
Empir. Softw. Eng.
Putting it all together: Using socio-technical networks to predict failures
Don’t touch my code!: Examining the effects of ownership on software quality
Pattern Recognition and Machine Learning
Searching Stack Overflow for API-usage-related bug fixes using snippet-based queries
Searching crowd knowledge to recommend solutions for API usage tasks
J. Softw.: Evol. Process
The scree test for the number of factors
Multivariate Behav. Res.
A study of a measure of sampling adequacy for factor-analytic correlation matrices
Multivariate Behav. Res.
Exploratory study of slack q&a chats as a mining source for software engineering tools
Crowd debugging
A metrics suite for object oriented design
IEEE Trans. Softw. Eng.
Context-based recommendation to support problem solving in software development
Recovering traceability links between an API and its learning resources
An omnibus test of normality for moderate and large size samples
Biometrika
An extensive comparison of bug prediction approaches
Evaluating the use of exploratory factor analysis in psychological research
Psychol. Methods
The impact of changes mislabeled by SZZ on just-in-time defect prediction
IEEE Trans. Softw. Eng.
Evaluating answer quality across knowledge domains: Using textual and non-textual features in social Q&A
An empirical study of just-in-time defect prediction using cross-project models
Empirical validation of object-oriented metrics on open source software for fault prediction
IEEE Trans. Softw. Eng.
Predicting faults using the complexity of code changes
Beyond lines of code: Do we need more complexity metrics
It’s not a bug, it’s a feature: How misclassification impacts bug prediction
The impact of correlated metrics on the interpretation of defect models
IEEE Trans. Softw. Eng.
Understanding and detecting real-world performance bugs
ACM SIGPLAN Not.
A large-scale empirical study of just-in-time quality assurance
IEEE Trans. Softw. Eng.
Predicting post-release defects using pre-release field testing results
API change and fault proneness: A threat to the success of android apps
How do API changes trigger Stack Overflow discussions? A study on the android SDK
Cited by (0)
Hamed Tahmooresi is a Ph.D. student at the Sharif University of Technology. His research interests include software engineering, software architecture and design, and mining software repositories. Contact him at [email protected]
Abbas Heydarnoori is an assistant professor at the Sharif University of Technology. Before, he was a post-doctoral fellow at the University of Lugano, Switzerland. Abbas holds a Ph.D. from the University of Waterloo, Canada. His research interests focus on software evolution, mining software repositories, and recommendation systems in software engineering. Contact him at [email protected]
Reza Nadri is currently a Master’s student and research assistant at the University of Waterloo, Canada. Before, he got his Bachelor’s degree from the Sharif University of Technology. His research interests include mining software repositories, software analytics, recommendation systems in software engineering, and social aspects of software engineering. Contact him at [email protected]
- 1
Present Address: School of Computer Science, University of Waterloo, 200 University Ave. W., Waterloo, ON, Canada, N2L 3G1.