A pattern-based approach to detect and improve non-descriptive test names

https://doi.org/10.1016/j.jss.2020.110639Get rights and content

Highlights

  • Unit tests are important but have non-descriptive names.

  • Lots of test patterns exist in unit tests.

  • A pattern-based approach is effective at detecting non-descriptive test names.

  • The approach can also improve non-descriptive test names.

Abstract

Unit tests are an important artifact that supports the software development process in several ways. For example, when a test fails, its name can provide the first step towards understanding the purpose of the test. Unfortunately, unit tests often lack descriptive names. In this paper, we propose a new, pattern-based approach that can help developers improve the quality of test names of JUnit tests by making them more descriptive. It does this by detecting non-descriptive test names and in some cases, providing additional information about how the name can be improved. Our approach was assessed using an empirical evaluation on 34352 JUnit tests. The results of the evaluation show that the approach is feasible, accurate, and useful at discriminating descriptive and non-descriptive names with a 95% true-positive rate.

Introduction

Unit tests are an important artifact that supports the software development process in several ways. In addition to helping developers ensure the quality of their software by checking for failures (Daka and Fraser, 2014), they can also serve as an important source of documentation not only for human developers but also for automated software engineering tools (e.g., recent work on fault localization by Li et al. uses test name information Li et al., 2019). For example, when a test fails, its name can provide the first step towards understanding the purpose of the test and ultimately fixing the cause of the observed failure. Similarly, a test’s name can help developers decide whether a test should be left alone, modified, or removed in response to changes in the application under test and whether the test should be included in a regression test suite.

In this work, we believe that test names are “good” if they are descriptive (i.e., they accurately summarize both the scenario and the expected outcome of the test Trenk, 2015) and “bad” if they are not descriptive. This is because descriptive names: (1) make it easier to tell if some functionality is not being tested—if a behavior is not mentioned in the name of a test, then the behavior is not being tested (2) help prevent tests that are too large or contain unrelated assertions—if a test cannot be summarized, it likely should be split into multiple tests (3) serve as documentation for the class under test—a class’s supported functionality can be identified by reading the names of its tests (Zhang et al., 2015).

Unfortunately, unit tests often lack descriptive names. For example, an exploratory study by Zhang et al. found that only 9 % of the 213,423 test names they considered were complete (i.e., fully described the body of test) while 62 % were missing some information and 29 % contained no useful information (e.g., tests named “test”) (Zhang et al., 2015). Poor test names can be due to developers writing non-descriptive or incomplete names. They can also occur due to incomplete code modifications. For example, a developer may modify a test’s body but fail to make the corresponding changes to the test’s name. Regardless of the cause, non-descriptive test names complicate comprehension tasks and increase the costs and difficulty of software development.

Because non-descriptive names negatively impact software development, there have been several attempts to address this issue. One approach has been to automatically generate names based on implementations (e.g., Arcuri et al., 2014, Zhang et al., 2015, Daka et al., 2017). For example, Zhang et al. and Daka et al. use static and dynamic analysis, respectively, to extract important expressions from a test’s body and natural language processing techniques to transform such expression into test names (Zhang et al., 2015, Daka et al., 2017). While automatically generating names from bodies eliminates the possibility of mismatches between names and bodies the generated names do not always meet with developer approval (e.g., they may not fit with existing naming conventions). Another approach is to help developers improve their existing names by suggesting improvements. For example, Høst and Østvold proposed an approach for Java methods and variables which uses a set of naming rules and related semantics (Høst and Østvold, 2009), Li et al. provided a learning-based approach to locate software faults using test name information (Li et al., 2019), and Allamanis et al. and Pradel and Sen use a model-based and a learning-based approach, respectively, to directly suggest better names or find name-based bugs to facilitate improvements (Allamanis et al., 2015, Pradel and Sen, 2018).

In this paper, we propose a new, pattern-based approach that can: (1) detect non-descriptive test names by finding mismatches between the name and body of a given JUnit test (2) provide descriptive information that consists of the main motive of test, the property to be tested in the test, and the prerequisite needed in the test or the object to be tested (see Section 2 for details) to facilitate the improvement of non-descriptive test names . Unlike existing approaches that suggesting improvements, which were designed to handle general methods, our approach is specific to JUnit tests. The narrower scope of the work allows it to take advantage of the highly repetitive structures that exist in both test names and bodies of JUnit tests (see Section 2). From a high-level point of view, the approach uses a set of predefined patterns to extract descriptive information from both a test’s name and body. This information is then compared to find non-descriptive names (i.e., cases where the name does not accurately summarize the body). When a mismatch is found, the information used by the approach can help developers address the mismatch and improve the quality of the test name.

To assess the pattern-based approach, we implemented it as an IntelliJ IDE plugin. The plugin was then used to carry out an empirical evaluation of the quality of more than 34,000 tests from 10 Java projects. Overall, the results of our evaluation are promising and show that the pattern-based approach is feasible, accurate, and effective.

In particular, this work makes the following contributions:

  • A novel, pattern-based approach can detect non-descriptive test names of JUnit tests and provide descriptive information about the unit tests to help developers improve existing unit tests.

  • A prototype implementation of the approach as an IntelliJ IDE plugin.

  • An empirical evaluation on 10 Java projects that shows: (1) the patterns are general and cover a majority of test names and bodies (2) the patterns can accurately extract descriptive information from both test names and bodies (3) the approach can accurately classify test names as either descriptive or non-descriptive .

Section snippets

Test patterns

We choose a pattern-based approach because unit tests often have similar structures that can be used to identify the purpose of a test from both its name and body. More specifically, patterns can be used to extract: (1) the action which is the focus of the test (i.e., what the test is testing) (2) the predicate which are the properties that will be checked by the test (3) the scenario which are the conditions under which the action is being performed or the predicate is evaluated.

As examples of

A pattern-based approach to detect non-descriptive test names

Fig. 30 presents a high-level overview of our pattern-based approach for detecting non-descriptive test names. As the figure shows, the approach takes as input a unit test comprised of its name and body. It then assesses the descriptiveness of the test’s name using two phases. The first phase, pattern-based analysis, uses the test patterns described in Section 2 to extract descriptive information from both the test name and the test body. The second phase, information comparison, compares the

Empirical evaluation

The overall goal of the evaluation is to determine if our approach can classify descriptive and non-descriptive test names. However, because the approach’s success for this task depends on the underlying patterns, we also evaluate several aspects of their performance. More specifically, we considered the following three research questions:

    RQ1—Feasibility.

    How many test names and bodies are matched by the patterns used by the approach?

    RQ2—Accuracy.

    How accurate are the patterns at extracting the

Related work

In this paper, we propose a pattern-based approach that involves different fields of research, so the purpose of this section is to review the most closely related works that come from each field.

Prototype implementation and threats to validity

The prototype implementation is publicly available (Wu and Clause, 2020b). All meta results from the pilot study and the pattern mining process, all the instances of non-descriptive test names from the 10 experimental subjects in Table 4, and the metadata of the evaluation are also uploaded in the repository. In addition, we are sharing data for the quantitative analysis that was performed in the evaluation (Wu and Clause, 2020a). Two threats to validity do exist for our test name/body

Conclusions and future work

Taking every test pattern into consideration, our selected test patterns can extract sufficient information from any unit test with matched name/body patterns. With the help of the output generated by our approach, developers can easily find non-descriptive test names from a given test corpus and improve those non-descriptive names by referring to the descriptive information. Furthermore, we also implemented our approach as an IntelliJ IDE plugin. In the empirical evaluation, the experimental

CRediT authorship contribution statement

Jianwei Wu: Conceptualization, Methodology, Data curation, Software, Visualization, Writing - original draft, Writing - review & editing, Formal analysis. James Clause: Conceptualization, Software, Writing - review & editing, Formal analysis, Supervision, Funding acquisition.

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Acknowledgment

This work is supported in part by National Science Foundation, USA Grant No. 1527093.

Jianwei Wu is a Ph.D. student at the University of Delaware. He was an undergraduate student at the Anhui University of Finance and Economics, China and finished his Bachelor of Engineering in 2016. He works as a research assistant at the University of Delaware under the advisement of Dr. James Clause. The main interests of his research are software testing and software documentation.

References (51)

  • LakhotiaA.

    Understanding someone else’s code: Analysis of experiences

    J. Syst. Softw.

    (1993)
  • AbebeS.L. et al.

    Natural language parsing of program element names for concept extraction

  • AllamanisM. et al.

    Learning natural coding conventions

  • AllamanisM. et al.

    Suggesting accurate method and class names

  • ArcuriA. et al.

    Automated unit test generation for classes with environment dependencies

  • ButlerS. et al.

    Improving the tokenisation of identifier names

  • CorazzaA. et al.

    LINSEN: An efficient approach to split identifiers and expand abbreviations

  • DakaE. et al.

    A survey on unit testing practices and problems

  • DakaE. et al.

    Generating unit tests with descriptive names or: Would you name your children thing1 and thing2?

  • DraganN.

    Emergent laws of method and class stereotypes in object oriented software

  • DraganN. et al.

    Reverse engineering method stereotypes

  • DraganN. et al.

    Automatic identification of class stereotypes

  • EnslenE. et al.

    Mining source code to automatically split identifiers for software analysis

  • Fournier-VigerP. et al.

    A survey of sequential pattern mining

    Data Sci. Pattern Recognit.

    (2017)
  • FraserG. et al.

    Evosuite: Automatic test suite generation for object-oriented software

  • GhafariM. et al.

    Automatically identifying focal methods under test in unit test cases

  • GithubM.

    Github

    (2018)
  • GomarizA. et al.

    Clasp: An efficient algorithm for mining frequent closed sequences

  • GoogleA.

    Google guava

    (2018)
  • GuerroujL. et al.

    Tidier: An identifier splitting approach using speech recognition techniques

    J. Softw.: Evol. Process

    (2013)
  • HillE. et al.

    An empirical study of identifier splitting techniques

    Empir. Softw. Eng.

    (2014)
  • HillE. et al.

    AMAP: Automatically mining abbreviation expansions in programs to enhance software maintenance tools

  • HøstE.W. et al.

    The java programmer’s phrase book

  • HøstE.W. et al.

    Debugging method names

  • JensenC.S. et al.

    Automated testing with targeted event sequence generation

  • Cited by (16)

    View all citing articles on Scopus

    Jianwei Wu is a Ph.D. student at the University of Delaware. He was an undergraduate student at the Anhui University of Finance and Economics, China and finished his Bachelor of Engineering in 2016. He works as a research assistant at the University of Delaware under the advisement of Dr. James Clause. The main interests of his research are software testing and software documentation.

    James Clause is an Associate Professor in the Department of Computer and Information Sciences at the University of Delaware. He received the MS and Ph.D. degrees in computer science from the University of Pittsburgh and the Georgia Institute of Technology, respectively. His primary areas of research are software testing, program analysis, green software engineering, and documentation.

    View full text