Elsevier

Applied Soft Computing

Volume 95, October 2020, 106667
Applied Soft Computing

Learning to rank developers for bug report assignment

https://doi.org/10.1016/j.asoc.2020.106667Get rights and content

Highlights

  • We tackle the problem of recommending the most suitable developer to fix an open bug report.

  • We apply learning-to-rank algorithm to rank the most suitable developers to fix a bug report.

  • Our model is trained using previously fixed bug reports, and using developers coding history.

  • We consider 22 characteristics that we transform into features for our ranking algorithm.

  • Experiments indicate that our solution offers higher accuracy than the other approaches.

Abstract

Bug assignment is a burden for projects receiving many bug reports. To automate the process of assigning bug reports to the appropriate developers, several studies have relied on combining natural language processing and information retrieval techniques to extract two categories of features. One of these categories targets developers who have fixed similar bugs before, and the other determines developers working on source files similar to the description of the bug. Commit messages represent another rich source for profiling developer expertise as the language used in commit messages is closer to that used in bug reports.

In this work, we propose a more enhanced profiling of developers through their commits, which are captured in a new set of features that we combine with features used in previous studies. More precisely, we propose an adaptive ranking approach that takes as input a given bug report and ranks the top developers who are most suitable to fix it. This approach learns from the history of previously fixed bugs to profile developers in terms of their expertise. With respect to a given bug report, the ranking score of each developer is computed as a weighted combination of an array of features encoding domain knowledge, where the weights are trained automatically on previously solved bug reports using a learning-to-rank technique. Our model was evaluated using around 22,000 bug reports, exported from four large scale open-source Java projects. Results show that our model significantly outperformed two recent state-of-the-art methods in recommending the suitable developer to handle a certain bug report. Specifically, the percentage of recommending a developer within the top 5 ranked developers correctly was over 80% for both the Eclipse UI Platform and Birt projects.

Introduction

A bug is an issue linked to coding that potentially triggers an abnormal software behavior. When a bug is discovered by software testers, they will open a bug report to initiate the process of correcting it. Bug report assignment is a critical step towards localizing and fixing a bug, as it is the art of matching the open bug report to the appropriate developer who is most likely to process it. Bug assignees typically perform a variety of code reviews and changes to replicate the bug and verify the reported issue to localize it. With the rise in the number of open bug reports, and with the natural increase in the number of teams and developers, matching bug reports to suitable developers becomes challenging, especially because a wrong assignment not only wastes a developer’s time but also adds the overhead of re-assigning the bug.

To tackle this challenge, several studies have instigated the design of automated recommendation techniques where relevant bug report information and the history of code changes are mined before assigning the bug report to the appropriate employee [1], [2], [3]. Studies analyzing a developer’s activities and experience are considered activity-based, while studies linking bug reports to a specific location in the code, and so to a potential developer, are considered location-based. A recent study by Tian et al. [4] presented a unified model that merges these activity-based and location-based features to benefit from their advantages at the expense of increasing complexity through their combination.

Furthermore, activity-based features heavily rely on profiling developers using their contributions to the project, such as code changes and previously handled bug reports. As the majority of developers’ contributions are represented by their code changes, and as code changes represent the largest source of bugs, linking developers’ code changes to the open bug reports has been proven to be a critical feature for improving the bug assignment process. However, bug report descriptions are written in natural language, while the source code is represented by language instructions. As these two languages differ in context, representation, and expression, this creates a lexical mismatch, hindering the efficiency of existing bug assignment approaches. Therefore, we propose the use of developers’ commit messages as a new set of features for the problem of bug assignment. As shown later in Section 4, commit messages are able to bridge the lexical gap as they are written using the same natural language of bug reports, besides containing valuable information that may not be captured by commit code changes.

In this study, we design a model that tackles the problem of assigning the appropriate developers to a given bug report. We first augment our activity-based feature space with a new set of measurements, representing the similarity between the bug report, and the commit messages, to better profile developers. Then we leverage both augmented activity-based features and location-based features to better profile each candidate developer. Since our study includes 22 features, we utilize the ranking variance of the Support Vector Machine (SVM) in order for us to learn how to rank developers according their relevance to a given bug report. To investigate the efficiency of our learning-to-rank model, we evaluate it using a total of 22,416 bug reports, extracted from four popular open-source software systems, namely, Birt, Eclipse UI, JDT, and SWT. Also, to show the impact of the proposed new features, we compared our model with and without them. The SVM rank of our model was also compared with other state-of-the-art ranking algorithms, such as naive aggregation, ordinal regression, and the random ranking as a baseline. Results show the ability of our model to outperform existing ranking models, by achieving an average accuracy of 93% when recommending TOP-10 developers to fix a given bug report.

In summary, our key contributions are as follows:

  • 1.

    We propose an enhanced activity-based features using information extracted from commit messages. This information can be treated as domain knowledge, which can be used to better profile developers and improve automated assignment of open bug reports. To the best of our knowledge, there is no existing study that used our enriched developer profiling features for the bug assignment problem.

  • 2.

    We perform a comparative study between our approach and two ranking algorithms that have been known to outperform state-of-the-art algorithms for the problem of bug localization [4], [5]. Our key findings show that our approach outperforms existing studies, mainly when the dimensionality of the problem is increased, i.e., the number of candidate developers for potential assignment is high.

  • 3.

    As we believe in the importance of improving the automation of bug localization and assignment, we encourage the reproducibility and extendibility of our study by providing the dataset and source code of our experiments.1

Section snippets

Background

The bug report life cycle involves various activities, which are outlined in this section. This process begins when the bug report is received by the development team. After that, the team member who is most likely familiar with the faulty source code will be assigned the task. The next step is to localize the bug. This process is performed by the assigned team member(s) who then try to find the ultimate cause and solve the misbehavior problem. Fig. 1 illustrates the life cycle of a bug report.

Related work

We discuss in this section the main two threads of related work: (1) Bug assignment approaches, and (2) other studies related to bugs management.

Feature extraction

This section presents how we extract features from the combination of source files, bug reports, and commit messages.

We cluster these features into the following main categories:

  • Activity-based features. This category contains all features related to developers’ previous experience with resolving bug reports and modified files. For a given bug report, we look for developers who have experience resolving similar bug reports.

  • Location-based features. This category contains all features related to

Feature combination

This section reviews the three methods we use to combine the features we have extracted from the bug-fixing data. Choosing the appropriate method of feature combination can be as important as choosing the appropriate features themselves. More sophisticated methods of feature combination can be trained to identify the importance of certain features. Accordingly, weighting factor can be applied to emphasize the ones that have been identified as important to the accuracy of the model.

Validation

In this section, we first present our research questions, and then explain our study approach. We then introduce the dataset used and review several corrections and filters applied to the data. Finally, we review the experimental setup and evaluation metrics.

Results & evaluation

In this section, we review the results from our experiment and discuss them to answer each of our research questions.

Future improvements

There are two main areas of improvements to explore for future work: (1) increasing the usefulness of our textual data in feature extraction and (2) exploring the suitability of different algorithms and approaches to aggregating feature scores. To increase the usefulness of our textual data, we identified several dimensions to explore. First, there may be words that, while not present in the standard list of stop words, are used so frequently in bug reports that they lose any value in text

Threats to validity

In this section, we identify several threats to the validity of our study.

Internal Validity. It is the level of confidence we have in the cause-and-effect relationship and the factors that might impact our evaluation. Since we are handling informal textual data, we cannot assume that users will follow appropriate grammatical rules. Therefore, we pre-processed, removed stop words, and stemmed data to mitigate this issue. We used the NLTK for text processing, because it was widely used in

Conclusion

This work presents a new set of features to be included in a ranking model to better assign bug reports to the most appropriate developer based on historical bug-fixing data. Our novel contributions introduce the inclusion of four commit message-related features, resulting in a higher accuracy achieved by the model, as well as an empirical evaluation across the two models. Our findings in these studies have indicated that the inclusion of our features contributed to a more accurate model. We

CRediT authorship contribution statement

Bader Alkhazi: Revision, Conceptualization, Methodology, Software. Andrew DiStasi: Implementation, Testing, Model Tuning, Writing - original draft. Wajdi Aljedaani: Data curation. Hussein Alrubaye: Writing - review & editing. Xin Ye: Writing - review & editing. Mohamed Wiem Mkaouer: Supervision, Writing - review & editing.

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

References (53)

  • ManningC.D. et al.

    Introduction to information retrieval, vol. 1 (1)

    (2008)
  • KochharP.S. et al.

    Automatic fine-grained issue report reclassification

  • RunesonP. et al.

    Detection of duplicate defect reports using natural language processing

  • AnvikJ. et al.

    Who should fix this bug?

  • WenM. et al.

    Locus: Locating bugs from software changes

  • AlOmarE. et al.

    Can refactoring be self-affirmed? an exploratory study on how developers document their refactoring activities in commit messages

  • AlOmarE.A. et al.

    On the impact of refactoring on the relationship between quality attributes and design metrics

  • WuW. et al.

    Drex: Developer recommendation with k-nearest-neighbor search and expertise ranking

  • SafdariN. et al.

    Learning to rank faulty source files for dependent bug reports

  • MurphyG. et al.

    Automatic bug triage using text categorization

  • AleneziM. et al.

    Efficient bug triaging using text mining

    JSW

    (2013)
  • BhattacharyaP. et al.

    Fine-grained incremental learning and multi-feature tossing graphs to improve bug triaging

  • TamrawiA. et al.

    Fuzzy set and cache-based approach for bug triaging

  • SharmaM. et al.

    Bug assignee prediction using association rule mining

  • AnvikJ. et al.

    Reducing the effort of bug report triage: Recommenders for development-oriented decisions

    ACM Trans. Softw. Eng. Methodol.

    (2011)
  • ShokripourR. et al.

    Why so complicated? simple term filtering and weighting for location-based bug report assignment recommendation

  • Cited by (0)

    View full text