Locating bugs without looking back

Dilshener, Tezcan; Wermelinger, Michel; Yu, Yijun

doi:10.1007/s10515-017-0226-1

Locating bugs without looking back

Open access
Published: 10 October 2017

Volume 25, pages 383–434, (2018)
Cite this article

Download PDF

You have full access to this open access article

Automated Software Engineering Aims and scope Submit manuscript

Locating bugs without looking back

Download PDF

4300 Accesses
11 Citations
3 Altmetric
Explore all metrics

Abstract

Bug localisation is a core program comprehension task in software maintenance: given the observation of a bug, e.g. via a bug report, where is it located in the source code? Information retrieval (IR) approaches see the bug report as the query, and the source code files as the documents to be retrieved, ranked by relevance. Such approaches have the advantage of not requiring expensive static or dynamic analysis of the code. However, current state-of-the-art IR approaches rely on project history, in particular previously fixed bugs or previous versions of the source code. We present a novel approach that directly scores each current file against the given report, thus not requiring past code and reports. The scoring method is based on heuristics identified through manual inspection of a small sample of bug reports. We compare our approach to eight others, using their own five metrics on their own six open source projects. Out of 30 performance indicators, we improve 27 and equal 2. Over the projects analysed, on average we find one or more affected files in the top 10 ranked files for 76% of the bug reports. These results show the applicability of our approach to software projects without history.

Large Language Model Assisted Software Engineering: Prospects, Challenges, and a Case Study

How different are different diff algorithms in Git?

Article Open access 11 September 2019

Yusuf Sulistyo Nugroho, Hideaki Hata & Kenichi Matsumoto

Software defect prediction: future directions and challenges

Article 27 February 2024

Zhiqiang Li, Jingwen Niu & Xiao-Yuan Jing

1 Introduction

Current software applications are valuable strategic assets to companies: they play a central role for the business and require continued maintenance. In recent years, research has focused on utilising concept location techniques in bug localisation to address the challenges of performing maintenance tasks on such applications.

In general, software applications consist of multiple components. When certain components of the application do not perform according to their predefined functionality, they are classified to be in error. These unexpected and unintended erroneous behaviours, also referred as bugs, are known to be often the product of coding mistakes. Upon discovering such abnormal behaviour of the software, a developer or a user reports it in a document referred as bug report (BR). BR documents may provide information that could help in fixing the bug by changing the relevant program elements of the application. Identifying where to make changes in response to a BR is called bug localisation. The change request is expressed as a BR and the end goal is to change the existing program elements (e.g. source code files) to correct an undesired behaviour of the software.

Li et al. (2006) classified bugs according to three categories: root cause, impact and the involved software components.

The root cause category includes memory and semantic related bugs like improper handling of memory objects or semantic bugs, which are inconsistent with the original design requirements or the programmers’ intention.
The impact category includes performance and functionality related bugs, like the program keeps running but does not respond, halts abnormally, mistakenly changes user data, or functions correctly but runs/responds slowly.
The software components category includes bugs related to the components implementing core functionality, graphical user interfaces, runtime environment and communication, as well as data base handling.

The study in Li et al. (2006), performed with two open source software (OSS) projects, showed that 81% of all bugs in Mozilla and 87% of those in Apache are semantics related. These percentages increase as the applications mature, and they have direct impact on system availability, contributing to 43–44% of crashes. Since it takes a longer time to locate and fix semantic bugs, more effort needs to be put into helping developers locate the bugs.

Generally, program comprehension tasks during software maintenance require additional effort from those developers who have little domain knowledge (Bennett and Rajlich 2000; Starke et al. 2009; Antoniol et al. 2002; Abebe et al. 2011). Therefore, a programmer can easily introduce semantic bugs due to inconsistent understanding of the requirements or intentions of the original developers. Early attempts to aid developers in recovering traceability links between source code files and system documentation used Information Retrieval (IR) methods, such as the Vector Space Model (VSM) (Salton and Buckley 1988) and Latent Semantic Indexing (LSI), and managed to achieve high precision (Marcus and Maletic 2003) or high recall (Antoniol et al. 2002). The idea behind VSM is that the more times a query term appears in a document relative to the number of times the term appears in all the documents in the collection, the more relevant that document is to the query. Vectors represent queries (bug reports in the case of bug localisation) and documents (source code files). Each element of the vector corresponds to a word or term extracted from the query’s or document’s vocabulary. The relevance of a document to a query can be directly evaluated by calculating the similarity of their word vectors.

These probabilistic IR approaches do not consider terms that are strongly related via structural information and thus still perform poorly in some cases (Petrenko and Rajlich 2013). Pure textual similarity may not be able to distinguish the actual buggy file from other files that are similar but unrelated (Wang et al. 2015). For example, Moreno et al. (2014) noted that when two files are structurally related, and when one of them has a higher textual similarity against a particular query, the irrelevant file could be ranked higher.

Further research recognised the need for combining multiple analysis approaches on top of IR to support program comprehension (Gethers et al. 2011). To determine the starting points, like class and method names, in investigating relevant source code files for maintenance work, techniques combining dynamic (Wilde and Scully 1995) and static (Marcus et al. 2005) analysis have been exploited (Poshyvanyk et al. 2007; Eisenbarth et al. 2001; Le et al. 2015). However, requests for new non-existing features are unsuitable for dynamic analysis (Poshyvanyk et al. 2007). A large software project or one with a long history may require time-consuming analysis, making static approaches impracticable (Rao and Kak 2011).

1.1 Vocabulary of bug reports in source code files

In general, a typical BR document provides multiple fields where information pertaining to the reported issue may be described, such as a brief summary of the problem, a detailed description of the conditions observed, date of the observed behaviour, name of the files changed to resolve the reported condition. Recent empirical studies provide evidence that many terms used in BRs are also present in the source code files (Saha et al. 2013; Moreno et al. 2013). Such BR terms are an exact or partial match of program elements (i.e. class, method or variable names and comments) in at least one of the files affected by the BR, i.e. those files actually changed to address the BR.

Moreno et al. (2013) showed that (1) the BR documents share more terms with the corresponding affected files and (2) the shared terms were present in source file names. The authors evaluated 6 Open Source Software (OSS) projects, containing over 35K source files and 114 BRs, which were solved by changing 116 files. For each BR and source file combination (over 1 million), they discovered that on average 75% share between 1-13 terms, 22% share nothing and only 3% share more than 13 terms. Additionally, the study revealed that certain locations of a source code file, e.g. a file name instead of a method signature, may have only a few terms but all of them may contribute to the number of shared terms between a BR and its affected files. The authors concluded that the BRs have more terms in common with the affected files and the common terms are present in the names of those affected files.

Saha et al. (2013) claimed that although class names are typically a combination of 2–4 terms, they are present in more than 35% of the BR summary fields and 85% of the BR description fields of the OSS project AspectJ. Furthermore, the exact file name is present in more than 50% of the bug descriptions. They concluded that when the terms from these locations are compared during a search, the noise is reduced automatically due to reduction in search space. For example, in 27 AspectJ BRs, at least one of the file names of the fixed files was present as-is in the BR summary, whereas in 101 BRs at least one of the file name terms was present.

1.2 Our aim and contributions

Motivated by these insights, we aim to check if the occurrence of file names in BRs can be leveraged for IR-based bug localisation in Java programs. We restrict the scope to Java programs, where each file is a class or interface, in order to directly match the class and interface names mentioned in the BRs to the files retrieved by IR-based bug localisation.

Current state-of-the-art approaches for Java programs [BugLocator (Zhou et al. 2012), BRTracer (Wong et al. 2014), BLUiR (Saha et al. 2013), AmaLgam (Wang and Lo 2014), LearnToRank (Ye et al. 2014), BLIA (Youm et al. 2015) and Rahman et al. (2015)] rely on project history to improve the suggestion of relevant source files. In particular they use similar BRs and recently modified files. The rationale for the former is that if a new BR x is similar to a previously closed BR y, the files affected by y may also be relevant to x. The rationale for the latter is that recent changes to a file may have led to the reported bug. However, the observed improvements using the history information have been small.

We thus wonder whether file names mentioned in the BR descriptions can replace the contribution of historical information in achieving comparable performance and ask our first research question (RQ) as follows.

RQ1 Can the occurrence of file names in BRs be leveraged to replace project history in achieving state-of-the-art IR-based bug localisation?

If file name occurrence can’t be leveraged, we need to look more closely at the contribution of past history, in particular of considering similar bug reports, an approach introduced by BugLocator and adopted by others. So we ask in our second RQ:

RQ2 What is the contribution of using similar bug reports?

Furthermore, IR-based approaches to locating bugs use a base IR technique that is applied in a context-specific way or combined with bespoke heuristics. However, Saha et al. (2013) note that the exact variant of the underlying tf/idf (term frequency/inverse document frequency) model used may affect results. In particular they find that the off-the-shelf model they use in BLUiR already outperforms BugLocator, which introduced rVSM, a bespoke VSM variant. In our approach we also use an off-the-shelf VSM tool, different from the one used by BLUiR. In comparing our results to theirs we must therefore distinguish what is the contribution of file names, and what is the contribution of the IR model used. Thus we ask in our third RQ:

RQ3 What is the overall contribution of the VSM variant adopted in our approach, and how does it perform compared to rVSM?

Previous studies performed by Starke et al. (2009) and Sillito et al. (2008) reveal that text-based searches available in current IDEs are inadequate because they require search terms to be precisely specified, otherwise irrelevant or no results are returned. They highlighted that large search results returned by the IDE tools cause developers to analyse several files before performing bug-fixing tasks. Thus we wanted to know how developers perceive the search results of our tool, which presents a ranked list of candidate source code files that may be relevant for a BR at hand during software maintenance. Hence we ask our fourth RQ:

RQ4 How does our approach perform with industrial applications and does it benefit developers?

To address RQ1 we propose a novel approach and then evaluate it against existing approaches (Zhou et al. 2012; Saha et al. 2013; Moreno et al. 2014; Wong et al. 2014; Wang and Lo 2014; Ye et al. 2014) on the same datasets and with the same performance metrics. Like other approaches, ours scores each file against a given BR and then ranks the files in descending order of score, aiming for at least one of the files affected by the BR to be among the top-ranked ones, so that it can serve as an entry point to navigate the code and find the other affected files. As we shall see, our approach outperforms the existing ones in the majority of cases. In particular it succeeds in placing an affected file among the top-1, top-5 and top-10 files for 44, 69 and 76% of BRs, on average.

Our scoring scheme does not consider any historical information in the repository, which contributes to an ab-initio applicability of our approach, i.e. from the very first bug report submitted for the very first version. Moreover, our approach is efficient, because of the straightforward scoring, which doesn’t require any further processing like dynamic code analysis to trace executed classes by re-running the scenarios described in the BRs.

To address RQ2, we compare the results of BugLocator and BRTracer using SimiScore (the similar bug score), and the results of BLUiR according to the literature, showing that SimiScore’s contribution is not as high as suggested. From our experiments, we conclude that our approach localises many bugs without using similar bug fix information, which were only localised by BugLocator, BRTracer or BLUiR using similar bug information.

As for RQ3, through experiments we found that VSM is a crucial component to achieve the best performance for projects with a larger number of files that makes the use of term and document frequency more meaningful, but that in smaller projects its contribution is rather small. The Lucene VSM we chose performs in general better than the bespoke VSM of BugLocator and BRTracer.

We address RQ4 by conducting a user case study in 3 different companies with 4 developers. On average, our tool placed at least one affected file into the top-10 for 9 out 10 BRs. Developers stated that since most of the relevant files were positioned in the top-5, they were able to avoid the error prone tasks of browsing long result lists and performing repetitive search queries.

The rest of this paper is organised as follows. Section 2 describes the IR-based approaches against which we compare ours, and the datasets used to evaluate all of them. Section 3 describes our approach, detailing the scoring algorithm. Section 4 presents the results, addressing the first three research questions above. The user study is presented in Sect. 5. We discuss why the approach works and the threats to validity in Sect. 6. Finally, Sect. 7 presents concluding remarks.

2 Previous approaches

In bug localisation a query is a bug report, which is substantially different in structure and content from other text documents, thus requiring special techniques. Zhou et al. (2012) proposed an approach consisting of the four traditional IR steps (corpus creation, indexing, query construction, retrieval & ranking) but using a revised Vector Space Model (rVSM) to score each source code file against the given BR. In addition, each file gets a similarity score (SimiScore) based on whether the file was affected by one or more closed BRs similar to the given BR. Similarity between BRs is computed using VSM. The rVSM and similarity scores are each normalised to a value from 0 to 1 and combined linearly into a final score: $(1-w)$*normalrVSM $+$ w*normalSimiScore, where w is an empirically set weight.

Table 1 Project artefacts

Locating bugs without looking back

Abstract

Similar content being viewed by others

Large Language Model Assisted Software Engineering: Prospects, Challenges, and a Case Study

How different are different diff algorithms in Git?

Software defect prediction: future directions and challenges

1 Introduction

1.1 Vocabulary of bug reports in source code files

1.2 Our aim and contributions

2 Previous approaches

2.1 Using stack trace and structure

2.2 Version history and other data sources

2.3 Combining multiple information sources

2.4 User studies

3 Our Approach

3.1 Data processing

3.2 Ranking files

3.2.1 Scoring with key positions (KP score)

3.2.2 Scoring with stack traces (ST score)

3.2.3 Scoring with text terms (TT score)

3.2.4 Rationale behind the scoring values

4 Evaluation of the results

4.1 RQ1: scoring with file names in BRs

4.1.1 Scoring with words in key positions (KP score)

4.1.2 Scoring with stack trace information (ST score)

4.1.3 Variations of score values

4.1.4 Overall results

4.2 RQ2: scoring without similar bugs

4.3 RQ3: VSM’s contribution

5 RQ4: user study

5.1 Study design

5.2 Results

5.2.1 Pre-session interview findings

5.2.2 Post-session findings

5.3 Evaluation of the results

6 Discussion

6.1 Threats to validity

7 Concluding remarks

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation