Characteristics of method extractions in Java: a large scale empirical study

Hora, Andre; Robbes, Romain

doi:10.1007/s10664-020-09809-8

Characteristics of method extractions in Java: a large scale empirical study

Published: 10 March 2020

Volume 25, pages 1798–1833, (2020)
Cite this article

Empirical Software Engineering Aims and scope Submit manuscript

663 Accesses
7 Citations
Explore all metrics

Abstract

Extract method is the “Swiss army knife” of refactorings: developers perform method extraction to introduce alternative signatures, decompose long code, improve testability, among many other reasons. Although the rationales behind method extraction are well explored, we are not yet aware of its characteristics. Assessing this information can provide the basis to better understand this important refactoring operation as well as improve refactoring tools and techniques based on the actual behavior of developers. In this paper, we assess characteristics of the extract method refactoring. We rely on a state-of-the-art technique to detect method extraction, and analyze over 70K instances of this refactoring, mined from 124 software systems. We investigate five aspects of this operation: magnitude, content, transformation, size, and degree. We find that (i) the extract method is among the most popular refactorings; (ii) extracted methods are over represented on operations related to creation, validation, and setup; (iii) methods that are targets of the extractions are 2.2x longer than the average, and they are reduced by one statement after the extraction; and (iv) single method extraction represents most, but not all, of the cases. We conclude by proposing improvements to refactoring detection, suggestion, and automation tools and techniques to support both practitioners and researchers.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Large Language Model Assisted Software Engineering: Prospects, Challenges, and a Case Study

How different are different diff algorithms in Git?

Article Open access 11 September 2019

Yusuf Sulistyo Nugroho, Hideaki Hata & Kenichi Matsumoto

Sampling in software engineering research: a critical review and guidelines

Article 28 April 2022

Sebastian Baltes & Paul Ralph

Notes

https://refactoring.com/catalog/extractMethod.html
https://martinfowler.com/articles/refactoringRubicon.html
Data collected with the Stack Exchange API: https://data.stackexchange.com
Question ID: 1155947
Question ID: 10289461
Question ID: 2619228
Question ID: 26674797
Question ID: 511211
https://stackoverflow.com/questions/2470653
Question ID: 29257032
Question ID: 4930742
Question ID: 1898645
1247835
19972611
Example from Arduino project: https://goo.gl/aD8n1N
Example from Arduino project: https://goo.gl/CaQWiB
Example from Arduino project: https://goo.gl/yPvj5M
https://github.com/iluwatar/java-design-patterns
https://git-scm.com/docs/git-log#git-log
https://bit.ly/2NsxgyB
The authors excluded the rename operations in their analysis.
Suffixes were much more widespread, with the top 10 prefixes covering only 7% of methods.
We omit the “All Methods” column in Table 6 because it is already presented in Table 5.
https://bit.ly/32lnFzd
https://bit.ly/36EJn4k
https://bit.ly/2pJsogM
https://bit.ly/34vuMXh
https://bit.ly/2oUebxa
https://bit.ly/2rcsqOt
https://bit.ly/2CcVCXY
https://bit.ly/2PWN2Vu
We only count the out-degree in the target methods with respect to the extracted methods, that is, the dashed lines in Fig. 5.
https://goo.gl/qbTHRz
https://goo.gl/otn7dC
https://goo.gl/yeDPLa
https://www.oreilly.com/library/view/refactoring-improving-the/9780134757681
http://jczeus.com/refac_cpp.html
https://refactoring.com/catalog/extractFunction.html

References

Alhindawi N, Dragan N, Collard ML, Maletic JI (2013) Improving feature location by enhancing source code with stereotypes. In: International conference on software maintenance. IEEE, pp 300–309
Allamanis M, Barr ET, Bird C, Sutton C (2015) Suggesting accurate method and class names. In: Joint meeting on foundations of software engineering, pp 38–49
Allamanis M, Peng H, Sutton C (2016) A convolutional attention network for extreme summarization of source code. In: International conference on machine learning, pp 2091–2100
Ambler SW, Sadalage PJ (2006) Refactoring databases: Evolutionary database design. Pearson Education
Ayewah N, Pugh W, Hovemeyer D, Morgenthaler JD, Penix J (2008) Using static analysis to find bugs. IEEE Softw 25(5):22–29
Article Google Scholar
Bavota G, Oliveto R, De Lucia A, Antoniol G, Gueheneuc YG (2010) Playing with refactoring: Identifying extract class opportunities through game theory. In: International conference on software maintenance (ICSM), pp 1–5
Bavota G, De Lucia A, Marcus A, Oliveto R (2014a) Automating extract class refactoring: an improved method and its evaluation. Empir Softw Eng 19(6):1617–1664
Bavota G, Oliveto R, Gethers M, Poshyvanyk D, De Lucia A (2014b) Methodbook: Recommending move method refactorings via relational topic models. IEEE Trans Softw Eng 40(7):671–694
Borges H, Valente MT (2018) What’s in a GitHub star? understanding repository starring practices in a social coding platform. J Sys Softw
Brown WH, Malveau RC, McCormick HW, Mowbray TJ (1998) AntiPatterns: refactoring software, architectures, and projects in crisis. Wiley, Hoboken
Google Scholar
Copeland T (2005) PMD applied, vol 10. Centennial Books Arexandria, Va, USA
Dig D, Comertoglu C, Marinov D, Johnson R (2006) Automated detection of refactorings in evolving components. In: European conference on object-oriented programming, pp 404–428
Dragan N, Collard ML, Hammad M, Maletic JI (2011) Using stereotypes to help characterize commits. In: International conference on software maintenance (ICSM). IEEE, pp 520–523
Dragan N, Collard ML, Maletic JI (2006) Reverse engineering method stereotypes. In: International conference on software maintenance. IEEE, pp 24–34
Dragan N, Collard ML, Maletic JI (2009) Using method stereotype distribution as a signature descriptor for software systems. In: International conference on software maintenance. IEEE, pp 567–570
Fowler M, Beck K (1999) Refactoring: improving the design of existing code. Addison-Wesley Professional
Hora A, Robbes R, Anquetil N, Etien A, Ducasse S, Valente MT (2015) How do developers react to API evolution? the Pharo ecosystem case. In: International conference on software maintenance and evolution, pp 251–260
Hora A, Robbes R, Valente MT, Anquetil N, Etien A, Ducasse S (2018) How do developers react to API evolution? a large-scale empirical study. Softw Qual J 26(1):161–191
Article Google Scholar
Hora A, Silva D, Robbes R, Valente MT (2018) Assessing the threat of untracked changes in software evolution. In: International conference on software engineering, pp 1102–1113
Host EW, Ostvold BM (2007) The programmer’s lexicon, volume i: The verbs. In: International working conference on source code analysis and manipulation, pp 193–202
Kalliamvakou E, Gousios G, Blincoe K, Singer L, German DM, Damian D (2014) The promises and perils of mining github. In: Working conference on mining software repositories, pp 92–101
Kim M, Gee M, Loh A, Rachatasumrit N (2010) Ref-Finder: a refactoring reconstruction tool based on logic query templates. In: International symposium on the foundations of software engineering, pp 371–372
Kim M, Zimmermann T, Nagappan N (2012) A field study of refactoring challenges and benefits. In: International symposium on the foundations of software engineering, p 50
Kim M, Zimmermann T, Nagappan N (2014) An empirical study of refactoring challenges and benefits at microsoft. IEEE Trans Softw Eng 40(7):633–649
Article Google Scholar
Lippert M, Roock S (2006) Refactoring in large software projects: performing complex restructurings successfully. Wiley, Hoboken
Google Scholar
Livshits B, Zimmermann T (2005) DynaMine: finding common error patterns by mining software revision histories. In: International symposium on the foundations of software engineering, pp 296–305
Martin RC (2009) Clean code: a handbook of agile software craftsmanship. Pearson Education
Meng S, Wang X, Zhang L, Mei H (2012) A history-based matching approach to identification of framework evolution. In: International conference on software engineering, pp 353–363
Mens T, Tourwé T (2004) A survey of software refactoring. IEEE Trans Softw Eng 30(2):126–139
Article Google Scholar
Meszaros G (2007) xUnit test patterns: Refactoring test code. Pearson Education
Murphy GC, Kersten M, Findlater L (2006) How are Java software developers using the Elipse IDE? IEEE Softw 23(4):76–83
Article Google Scholar
Murphy-Hill E, Black AP (2008) Breaking the barriers to successful refactoring: observations and tools for extract method. In: International conference on software engineering, pp 421–430
Murphy-Hill E, Black AP (2008) Refactoring tools: Fitness for purpose. IEEE Software 25(5)
Murphy-Hill E, Parnin C, Black AP (2012) How we refactor, and how we know it. IEEE Trans Softw Eng 38(1):5–18
Article Google Scholar
Murphy-Hill E, Zimmermann T, Bird C, Nagappan N (2015) The design space of bug fixes and how developers navigate it. IEEE Trans Softw Eng 41(1):65–81
Article Google Scholar
Negara S, Chen N, Vakilian M, Johnson RE, Dig D (2013) A comparative study of manual and automated refactorings. In: European conference on object-oriented programming. Springer, pp 552–576
Roberts D, Brant J, Johnson R (1997) A refactoring tool for smalltalk. Theory and Practice of Object Systems 3(4)
Silva D, Terra R, Valente MT (2014) Recommending automated extract method refactorings. In: International conference on program comprehension (ICPC), pp 146–156
Silva D, Tsantalis N, Valente MT (2016) Why we refactor? confessions of GitHub contributors. In: International symposium on the foundations of software engineering, pp 858–870
Silva D, Valente MT (2017) RefDiff: detecting refactorings in version histories. In: International conference on mining software repositories, pp 269–279
Simon F, Steinbruckner F, Lewerentz C (2001) Metrics based refactoring. In: European conference on software maintenance and reengineering, pp 30–38
Terra R, Valente MT, Miranda S, Sales V (2018) JMove: A novel heuristic and tool to detect move method refactoring opportunities. J Sys Softw 138:19–36
Article Google Scholar
Tourwé T, Mens T (2003) Identifying refactoring opportunities using logic meta programming. In: European conference on software maintenance and reengineering, pp 91–100
Tsantalis N, Chatzigeorgiou A (2009) Identification of move method refactoring opportunities. IEEE Transactions on Software Engineering 35(3)
Tsantalis N, Chatzigeorgiou A (2011) Identification of extract method refactoring opportunities for the decomposition of methods. J Syst Softw 84(10):1757–1782
Article Google Scholar
Tsantalis N, Guana V, Stroulia E, Hindle A (2013) A multidimensional empirical study on refactoring activity. In: Conference of the centre for advanced studies on collaborative research, pp 132–146
Tsantalis N, Mansouri M, Eshkevari LM, Mazinanian D, Dig D (2018) Accurate and efficient refactoring detection in commit history. In: International conference on software engineering, pp 483–494
Vakilian M, Johnson RE (2014) Alternate refactoring paths reveal usability problems. In: Proceedings of the 36th international conference on software engineering, pp 1106–1116
Vasilescu B, Casalnuovo C, Devanbu P (2017) Recovering clear, natural identifiers from obfuscated js names. In: Joint meeting on foundations of software engineering, pp 683–693
Vassallo C, Grano G, Palomba F, Gall HC, Bacchelli A (2019) A large-scale empirical exploration on refactoring activities in open source software projects. Sci Comput Program 180:1–15
Article Google Scholar
Wang Y (2009) What motivate software engineers to refactor source code? evidences from professional developers. In: International conference on software maintenance, pp 413–416
Weissgerber P, Diehl S (2006) Identifying refactorings from source-code changes. In: International conference on automated software engineering, pp 231–240
Wu W, Gueheneuc YG, Antoniol G, Kim M (2010) AURA: a hybrid approach to identify framework evolution. In: International conference on software engineering, pp 325–334
Xavier L, Brito A, Hora A, Valente MT (2017) Historical and impact analysis of API breaking changes: A large scale study. In: International conference on software analysis, evolution and reengineering, pp 138–147
Xing Z, Stroulia E (2006) Refactoring detection based on umldiff change-facts queries. In: Working conference on reverse engineering, pp 263–274

Download references

Author information

Authors and Affiliations

Department of Computer Science, UFMG, Belo Horizonte, Brazil
Andre Hora
Free University of Bozen-Bolzano, Bolzano, Italy
Romain Robbes

Authors

Andre Hora
View author publications
You can also search for this author in PubMed Google Scholar
Romain Robbes
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Andre Hora.

Additional information

Communicated by: Christoph Treude

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Hora, A., Robbes, R. Characteristics of method extractions in Java: a large scale empirical study. Empir Software Eng 25, 1798–1833 (2020). https://doi.org/10.1007/s10664-020-09809-8

Download citation

Published: 10 March 2020
Issue Date: May 2020
DOI: https://doi.org/10.1007/s10664-020-09809-8

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Characteristics of method extractions in Java: a large scale empirical study

Abstract

Access this article

Similar content being viewed by others

Large Language Model Assisted Software Engineering: Prospects, Challenges, and a Case Study

How different are different diff algorithms in Git?

Sampling in software engineering research: a critical review and guidelines

Notes

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Characteristics of method extractions in Java: a large scale empirical study

Abstract

Access this article

Similar content being viewed by others

Large Language Model Assisted Software Engineering: Prospects, Challenges, and a Case Study

How different are different diff algorithms in Git?

Sampling in software engineering research: a critical review and guidelines

Notes

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation