Skip to main content
Log in

Understanding large-scale software systems – structure and flows

  • Published:
Empirical Software Engineering Aims and scope Submit manuscript

Abstract

Program comprehension accounts for a large portion of software development costs and effort. The academic literature contains mainly research on program comprehension of short code snippets, but comprehension at the system level is no less important. We claim that comprehending a software system is a distinct activity that differs from code comprehension. We interviewed experienced developers, architects, and managers in the software industry and open-source community, to uncover the meaning of program comprehension at the system level; later we conducted a survey to verify the findings. The interviews demonstrate, among other things, that system comprehension is largely detached from code and programming language, and includes scope that is not captured in the code. It focuses on one hand on the structure of the system, and on the other hand on the flows in the system, but less on the code itself. System comprehension is a continuous, unending, iterative process, which utilizes white-box and black-box approaches at different layers of the system depending on needs, and combines both bottom-up and top-down comprehension strategies. In summary, comprehending a system is not just comprehending the code at a larger scale, and it is not possible to comprehend large systems at the same level as comprehending code.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4

Similar content being viewed by others

Notes

  1. In the hardware world there is a distinction between the terms ‘architecture’ and ‘micro-architecture’, reflecting the externally visible attributes of the system vs. its internal design. Software organizations in hardware corporations sometimes adopt this terminology. However in the software world the term ‘architecture’ usually refers mainly to the internal design.

References

  • Ajami S, Woodbridge Y, Feitelson DG (2019) Syntax, predicates, idioms — what really affects code complexity?. Empirical Softw Eng 24(1):287–328

    Article  Google Scholar 

  • Alomari HW, Jennings RA, Virote de Souza P, Stephen M, Gannod GC (2016) vizSlice: Visualizing large scale software slices. In: IEEE Working Conf. Softw. Visualization, pp 101–105

  • Alon U, Brody S, Levy O, Yahav E (2019) Code2seq: Generating sequences from structured representations of code. In: International conference learning representations (7)

  • Alpern B, Schneider FB (1987) Recognizing safety and liveness. Distributed Comput 2(3):117–126

    Article  Google Scholar 

  • Arisholm E, Briand LC, Hove SE, Labiche Y (2006) The impact of UML documentation on software maintenance: An experimental evaluation. IEEE Trans Softw Eng 32(6):365–381

    Article  Google Scholar 

  • Austin MA, Samadzadeh MH (2005) Software comprehension/maintenance: An introductory course. In: Proc. 18th International conference systems engineering. IEEE, pp 414–419

  • Avidan E, Feitelson DG (2017) Effects of variable names on comprehension: An empirical study. In: International conference program comprehension (25)55–65

  • Bach MJ (1986) The Design of the UNIX operating system. Prentice-Hall

  • Banker RD, Datar SM, Kemerer CF, Zweig D (1993) Software complexity and maintenance costs. Comm ACM 36(11):81–94

    Article  Google Scholar 

  • Beniamini G, Gingichashvili S, Klein Orbach A, Feitelson DG (2017) Meaningful identifier names: The case of single-letter variables. In: International conference program comprehension (25)45–54

  • Bogner A, Menz W (2009) The theory-generating expert interview: Epistemological interest, forms of knowledge, interaction. In: Bogner A, Littig B, Menz W (eds) Interviewing experts. Palgrave Macmillan, pp 43–80

  • Brooks FP Jr (1987) No silver bullet: Essence and accidents of software engineering. Computer 20(4):10–19

    Article  MathSciNet  Google Scholar 

  • Brooks R (1983) Towards a theory of the comprehension of computer programs. Intl J Man-Machine Studies 18(6):543–554

    Article  Google Scholar 

  • Brunner T, Porkoláb Z (2019) The role of the version control information in code comprehension. In: IEEE International science and technology conference. Informatics (15)219–224

  • Carter N, Bryant-Lukosius D, DiCenso A, Blythe J, Neville AJ (2014) The use of triangulation in qualitative research. Oncol Nurs Forum 41 (5):545–547

    Article  Google Scholar 

  • Conway ME (1968) How do committees invent. Datamation 14 (4):28–31

    Google Scholar 

  • Cook C, Bregar W, Foote D (1984) A preliminary investigation of the use of the cloze procedure as a measure of program understanding. Inf Process Manag 20(1-2):199–208

    Article  Google Scholar 

  • Cornelissen B, Zaidman A, van Deursen A (2011) A controlled experiment for program comprehension through trace visualization. IEEE Trans Softw Eng 37(3):341–355

    Article  Google Scholar 

  • Cornelissen B, Zaidman A, van Deursen A, Moonen L, Koschke R (2009) A systematic survey of program comprehension through dynamic analysis. IEEE Trans Softw Eng 35(5):684–702

    Article  Google Scholar 

  • Crouch M, McKenzie H (2006) The logic of small samples in interview-based qualitative research. Soc Sci Inf 45(4):483–499

    Article  Google Scholar 

  • Fekete A, Porkoláb Z. (2020) A comprehensive review on software comprehension models. Annales Mathematicae et Informaticae 51:103–111

    Article  Google Scholar 

  • Feng Y, Dreef K, Jones JA, van Deursen A (2018) Hierarchical abstraction of execution traces for program comprehension. In: International conference. program comprehension (26)86–96

  • Glaser B, Strauss A (1967) The discovery of grounded theory: Strategies for qualitative research. Sociology Press

  • Haiduc S, Aponte J, Marcus A (2010) Supporting program comprehension with source code summarization. In: 32nd Intl. conf. softw. eng. 2:223–226

    Google Scholar 

  • Henry S, Kafura D (1981) Software structure metrics based on information flow. IEEE Trans Softw Eng SE-7(5):510–518

    Article  Google Scholar 

  • Hwa J, Lee S, Kwon YR (2009) Hierarchical understandability assessment model for large-scale OO system. In: Asia-Pacific softw. Engineering conferences (16)11–18

  • Jaffe A, Lacomis J, Schwartz EJ, Le Goues C, Vasilescu B (2018) Meaningful variable names for decompiled code: A machine translation approach. In: International conference Program Comprehension (26)

  • Jbara A, Feitelson DG (2014) On the effect of code regularity on comprehension. In: International conference Program Comprehension (22)189–200

  • Ko AJ (2017) A three-year participant observation of software startup software evolution. In: International conference Software engineering(39)

  • Kosar T, Gaberc S, Carver JC, Mernik M (2018) Program comprehension of domain-specific and general-purpose languages: replication of a family of experiments using integrated development environments. Empir Softw Eng 23(5):2734–2763

    Article  Google Scholar 

  • Kozaczynski W, Letovsky S, Ning J (1991) A knowledge-based approach to software system understanding. In: Ann. knowledge-based software engineering conference. (6)162–170

  • Kruchten P (1995) The 4 + 1 view model of architecture. IEEE Softw 12(6):42–50

    Article  Google Scholar 

  • Kruchten P (2004) An ontology of architectural design decisions in software-intensive systems. In: Groningen workshop on software variability management (2)54–61

  • Kulkarni A (2016) Comprehending source code of large software system for reuse. In: International conference program comprehension. IEEE, (24)1–4

  • Kulkarni N, Varma V (2017) Perils of opportunistically reusing software module. Software: Practice and Experience 47(7):971–984

    Google Scholar 

  • Lehman MM (1980) Programs, life cycles, and laws of software evolution. Proc IEEE 68(9):1060–1076

    Article  Google Scholar 

  • Letovsky S (1987) Cognitive processes in program comprehension. J Syst Softw 7(4):325–339

    Article  Google Scholar 

  • Levy O, Feitelson DG (2019) Understanding large-scale software – a hierarchical view. In: International conference Program Comprehension (27)283–293

  • Lions J (1996) Lions’ Commentary on UNIX 6th Edition, with Source Code. Annabooks

  • Littman DC, Pinto J, Letovsky S, Soloway E (1987) Mental models and software maintenance. J Syst Softw 7(4):341–355

    Article  Google Scholar 

  • Maletic JI, Marcus A, Collard ML (2002) A task oriented view of software visualization. In: International Workshop visualizing software for understanding and analysis (1)32–40

  • Maletic JI, Mosora DJ, Newman CD, Collard ML, Sutton A, Robinson BP (2011) MosaiCode: Visualizing large scale software. In: International Workshop visualizing software for understanding & analysis(6)

  • Martin RC (2015) Expecting professionalism. https://youtu.be/BSaAMQVq01E?t=2102. Accessed: 2020-05-15

  • McCabe T (1976) A complexity measure. IEEE Trans Softw Eng SE-2(4):308–320

    Article  MathSciNet  Google Scholar 

  • McKusick MK, Bostic K, Karels MJ, Quarterman JS (1996) The design and implementation of the 4.4BSD operating system. Addison Wesley

  • Medeirios F, Lima G, Amaral G, Apel S, Kästner C, Ribeiro M, Gheyi R (2019) An investigation of misunderstanding code patterns in C open-source software projects. Empirical Softw Eng 24(4):1693–1726

    Article  Google Scholar 

  • Metz S (2014) All the little things. https://www.youtube.com/watch?v=8bZh5LMaSmE. Accessed 11 Aug 2018

  • Meyers B (1992) Applying “design by contract”. Computer 25 (10):40–51

    Article  Google Scholar 

  • Moonen L, Yazdanshenas AR (2016) Analyzing and visualizing information flow in heterogeneous component-based software systems. Inf Softw Tech 77:34–55

    Article  Google Scholar 

  • Panas T, Epperly T, Quinlan D, Sæbjørnsen A, Vuduc R (2007) Communicating software architecture using a unified single-view visualization. In: IEEE International conference Engineering Complex Comput. Syst. (12) 217–228

  • Parnas DL (1972) On the criteria to be used in decomposing systems into modules. Comm ACM 15(12):1053–1058

    Article  Google Scholar 

  • Parnas DL, Clements PC, Weiss DM (1985) The modular structure of complex systems. IEEE Trans Softw Eng SE-11(3):259–266

    Article  Google Scholar 

  • Petersen K, Badampudi D, Shah SMA, Wnuk K, Gorschek T, Papatheocharous E, Axelsson J, Sentilles S, Crncovic I, Cicchetti A (2018) Choosing component origins for software intensive systems: In-house, COTS, OSS, or outsorcing? — a case survey. IEEE Trans Softw Eng 44(3):237–261

    Article  Google Scholar 

  • Razavizadeh A, Cimpan S, Verjus H, Ducasse S (2009) Software system understanding via architectural views extraction according to multiple viewpoints. In: On the move to meaningful internet systems: OTM 2009 Workshops, LNCS, vol 5872. Springer, pp 433–442

  • Rodeghero P, Liu C, McBurney PW, McMillan C (2015) An eye-tracking study of Java programmers and application to source code summarization. IEEE Trans Softw Eng 41(11):1038–1054

    Article  Google Scholar 

  • Roehm T, Tiarks R, Koschke R, Maalej W (2012) How do professional developers comprehend software? Intl Conf Softw Eng 34:255–265

    Google Scholar 

  • Sackman H, Erikson WJ, Grant EE (1968) Exploratory experimental studies comparing online and offline programming performance. Comm ACM 11 (1):3–11

    Article  Google Scholar 

  • Salah M, Mancoridis S, Antoniol G, Di Penta M (2006) Scenario-driven dynamic analysis for comprehending large software systems. In: Proc. 10th European conference software maintenance & reengineering. IEEE, p 10

  • Salvaneschi G, Proksch S, Amann S, Nadi S, Mezini M (2017) On the positive effect of reactive programming on software comprehension: An empirical study. IEEE Trans Softw Eng 43(12):1125–1143

    Article  Google Scholar 

  • Shneiderman B (1976) Exploratory experiments in programmer behavior. Intl J Comput Info Sci 5(2):123–143

    Article  MathSciNet  Google Scholar 

  • Siegmund J, Brechmann A, Apel S, Kästner C., Liebig J, Leich T, Saake G (2012) Toward measuring program comprehension with functional magnetic resonance imaging. In: Proceedings of the ACM SIGSOFT 20th International symposium on the foundations of software engineering. ACM, p 24

  • Şora I (2015) Helping program comprehension of large software systems by identifying their most important classes. In: International conference on evaluation of novel approaches to software engineering. Springer, 122–140

  • Spolsky J (2002) The law of leaky abstractions. https://www.joelonsoftware.com/2002/11/11/the-law-of-leaky-abstractions/https://www.joelonsoftware.com/2002/11/11/the-law-of-leaky-abstractions/. Accessed: 2018-09-26

  • Storey M.-A. (2006) Theories, tools and research methods in program comprehension. Past, present and future Softw Quality J 14(3):187–208

    Google Scholar 

  • Störrle H. (2014) On the impact of layout quality to understanding UML diagrams: Size matters. In: International conference on model driven engineering languages and systems. Springer, 518–534

  • Tichy W (2011) The evidence for design patterns. In: Oram A, Wilson G (eds) Making Software. O’Reilly Media Inc., pp 393–414

  • Torchiano M, Scanniello G, Ricca F, Reggio G, Leotta M (2017) Do UML object diagrams affect design comprehensibility? results from a family of four controlled experiments. J Vis Languages Comput 41:10–21

    Article  Google Scholar 

  • von Mayrhauser A, Vans AM (1994) Comprehension processes during large scale maintenance. In: International conference Software Engineering(16)39–48

  • von Mayrhauser A, Vans AM (1994) Dynamic code cognition behaviors for large scale code. In: Workshop Program Comrehension (3)74–81

  • von Mayrhauser A, Vans AM (1995) Program comprehension during software maintenance and evolution. Computer 28(8):44–55

    Article  Google Scholar 

  • von Mayrhauser A, Vans AM (1996) On the role of hypotheses during opportunistic understanding while porting large scale code. In: Workshop Program Comrehension, (4)68–77

  • von Mayrhauser A, Vans AM (1998) Program understanding behavior during adaptation of large scale software. In: Workshop program comrehension, (6)164–172

  • von Mayrhauser A, Vans AM, Howe AE (1997) Program understanding behavior during enhancement of large-scale software. J Softw Maintenance Res Pract 9(5):299–327

    Article  Google Scholar 

  • Weissman L (1974) Psychological complexity of computer programs: An experimental methodology. SIGPLAN Notices 9(6):25–36

    Article  Google Scholar 

  • Wettel R, Lanza M (2007) Program comprehension through software habitability. In: International conference Program Comprehension, (15)231–240

  • Wettel R, Lanza M (2007) Visualizing software systems as cities. In: IEEE International workshop visualizing software for understanding & analysis, (4)92–99

  • Wikipedia (2018) Java package. https://en.wikipedia.org/wiki/Java_package. Accessed 31 Oct

  • Xia X, Bao L, Lo D, Xing Z, Hassan AE, Li S (2018) Measuring program A large-scale field study with professionals. IEEE Trans Softw Eng 44 (10):951–976

    Article  Google Scholar 

  • Zhang H, Zhao H, Cai W, Liu J, Zhou W (2010) Using the k-core decomposition to analyze the static structure of large-scale software systems. J Supercomput 53(2):352–369

    Article  Google Scholar 

Download references

Acknowledgments

Many thanks to Neta Kligler-Vilenchik who provided us with invaluable guidance in the methodology of text analysis. Bareket Henle assisted with development of the initial interview plan.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Dror G. Feitelson.

Additional information

Communicated by: Federica Sarro and Foutse Khomh

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

This article belongs to the Topical Collection: International Conference on Program Comprehension (ICPC)

Dror Feitelson holds the Berthold Badler Chair in Computer Science. This research was supported by the ISRAEL SCIENCE FOUNDATION (grants no. 407/13 and 832/18). This paper is an invited extended version of a paper from ICPC 2019.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Levy, O., Feitelson, D.G. Understanding large-scale software systems – structure and flows. Empir Software Eng 26, 48 (2021). https://doi.org/10.1007/s10664-021-09938-8

Download citation

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s10664-021-09938-8

Keywords

Navigation