Mining Association Rules from Code (MARC) to support legacy software management

Tjortjis, Christos

doi:10.1007/s11219-019-09480-3

Mining Association Rules from Code (MARC) to support legacy software management

Published: 19 December 2019

Volume 28, pages 633–662, (2020)
Cite this article

Software Quality Journal Aims and scope Submit manuscript

Christos Tjortjis ORCID: orcid.org/0000-0001-8263-9024¹

241 Accesses
5 Citations
Explore all metrics

Abstract

This paper presents a methodology for Mining Association Rules from Code (MARC), aiming at capturing program structure, facilitating system understanding and supporting software management. MARC groups program entities (paragraphs or statements) based on similarities, such as variable use, data types and procedure calls. It comprises three stages: code parsing/analysis, association rule mining and rule grouping. Code is parsed to populate a database with records and respective attributes. Association rules are then extracted from this database and subsequently processed to abstract programs into groups containing interrelated entities. Entities are then grouped together if their attributes participate to common rules. This abstraction is performed at the program level or even the paragraph level, in contrast to other approaches that work at the system level. Groups can then be visualised as collections of interrelated entities. The methodology was evaluated using real-life COBOL programs. Results showed that the methodology facilitates program comprehension by using source code only, where domain knowledge and documentation are either unavailable or unreliable.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Mining Patterns in Source Code Using Tree Mining Algorithms

Aggregating Association Rules to Improve Change Recommendation

Article 01 December 2017

Thomas Rolfsnes, Leon Moonen, … Dave Binkley

Guided pattern mining for API misuse detection by change-based code analysis

Article Open access 17 August 2021

Sebastian Nielebock, Robert Heumüller, … Frank Ortmeier

References

Agrawal, R. and Srikant, R., (1994), “Fast algorithms for mining association rules”, Proc. 20^thInt’l Conf. Very Large Data Bases (VLDB 94), pp. 487-499.
Arshad S., Tjortjis, C., (2016) “Clustering software metric values extracted from C# code for maintainability assessment”, SETN 16, Article No. 24, ACM Int’l Conf. Proc. Series.
Brooks, R. (1983). Towards a theory of the comprehension of computer programs. International Journal of Man-Machine Studies, 18(6), 543–554.
Article Google Scholar
Canfora, G., Cimitile, A., De Lucia, A., & Di Lucca, G. A. (2001). Decomposing legacy systems into objects: an eclectic approach. Information and Software Technology, 43, 401–412.
Article Google Scholar
“COBOL blues”, Reuters (2017). http://fingfx.thomsonreuters.com/gfx/rngs/USA-BANKS-COBOL/010040KH18J/index.html Last access 30/11/18.
Dave, N., Potts, K., Dinh, V., & Asuncion, H. U. (2014). Combining association mining with topic modeling to discover more file relationships. International Journal on Advances in Software, 7(3 & 4), 539–550.
Google Scholar
De Oca, C.M. and Carver, D.L., (1998) “Identification of data cohesive subsystems using data mining techniques”, Proceedings International Conference Software Maintenance (ICSM 98), IEEE Comparative. Soc. Press, pp.16-23.
Deng, Z. H., & Lv, S. L. (2014). Fast mining frequent itemsets using nodesets. Expert Systems with Applications, 41(10), 4505–4512.
Article Google Scholar
Eddy, B., (2014) “Structured source retrieval for improving software search during program comprehension tasks Proceedings ACM SIGPLAN Conference. Systems, Programming, and Applications: Software for Humanity (SPLASH '14), pp. 13-15.
Ghafari, S.M. and Tjortjis, C., (2016) “Association rules mining by improving the imperialism competitive algorithm (ARMICA)”, IFIP Advances in Information and Communication Technology, Proc. 12th Int'l Conf. on Artificial Intelligence Applications and Innovations (AIAI 2016), Vol. 475, Springer, pp 242-254.
Ghafari, S.M., Tjortjis, C., (2019) “A survey on association rules mining using heuristics”, WIREs Data Mining and Knowledge Discovery, Wiley, Vol. 9, no. 4
Kanellopoulos, Y., Makris, C., & Tjortjis, C. (2007). An improved methodology on information distillation by mining program source code. Data & Knowledge Engineering, 61(2), 359–383.
Article Google Scholar
Khadka, R., Batlajery, B.V., Saeidi, A., Jansen, S., Hage, J., (2014) “How do professionals perceive legacy systems and software modernization?”, Proceedings36th International Conference Software Engineering (ICSE 14), pp. 36-47.
Kouris, I. N., Makris, C., and Tsakalidis, A., (2003) “An improved algorithm for mining association rules using multiple support values”, Proc. 16th Int’l Florida Artificial Intelligence Research Society Conf., (FLAIRS 03), pp. 309-314.
Kunz, T., & Black, J. P. (1995). Using automatic process clustering for design recovery and distributed debugging. IEEE Transactions on Software Engineering, 21(6), 515–527.
Article Google Scholar
Lakhotia, A. (1997). A unified system for expressing software subsystem classification techniques. Journal of Systems and Software, 36(3), 211–231.
Article Google Scholar
Letovsky, S., (1986) “Cognitive processes in program comprehension”, 1^stWorkshop Empirical Studies of Programmers, pp 58-79.
Littman, D. C., Pinto, J., Letovsky, S., and Soloway, E., (1986) “Mental models and software maintenance”, 1^stWorkshop Empirical Studies of Programmers, pp. 80-98.
Maqbool, O., Babri, H. A., Karim, A., & Sarwar, M. (2005). Metarule-guided association rule mining for program understanding. IEE Proceedings Software, 152(6), 281–296.
Article Google Scholar
Menzies, T., Dekhtyar, A., Distefano, J., & Greenwald, J. (2007). Problems with precision: a response to comments on data mining static code attributes to learn defect predictors. IEEE Transactions on Software Engineering, 33(9), 1–4.
Article Google Scholar
Misirli, A. T., Bener, A. B., & Turhan, B. (2011). An industrial case study of classifier ensembles for locating software defects. Software Quality Journal, 19(3), 515–536.
Article Google Scholar
Mitchell, B. S., & Mancoridis, S. (2006). On the automatic modularization of software systems using the bunch tool. IEEE Transactions on Software Engineering, 32(3), 193–208.
Article Google Scholar
Papas, D., & Tjortjis, C. (2014). Combining clustering and classification for software quality evaluation. Lecture Notes Computer Science, Springer-Verlag, 8445, 273–286.
Article Google Scholar
Rousidis, D., Tjortjis, C., (2005) “Clustering data retrieved from java source code to support software maintenance: a case study”, Proc. IEEE 9th European Conf. Software Maintenance Reengineering (CSMR 05), IEEE Comp. Soc. Press, pp. 276-279.
Sartipi, K., Kontogiannis, K. and Mavaddat, F., (2000), “Architectural design recovery using data mining techniques”, Proc. 2nd European Working Conf. Software Maintenance Reengineering (CSMR 2000), IEEE Comp. Soc. Press, pp. 129-140.
Shtern, M., & Tzerpos, V. (2014). Methods for selecting and improving software clustering algorithms. Software Practice and Experience, 44(1), 33–46.
Article Google Scholar
Silva, J. (2012). A vocabulary of program slicing-based techniques. ACM Computing Surveys (CSUR), 44(3), 12.
Article Google Scholar
Sobernig, S., and Zdun, U., (2016) “Distilling architectural design decisions and their relationships using frequent item-sets”, 13th Working IEEE/IFIP Conf. on Software Architecture, pp. 61-70.
Soloway, E., & Ehrlich, K. (1984). Empirical studies of programming knowledge. IEEE Transactions on Software Engineering, 10(5), 595–609.
Article Google Scholar
Sommerville, I. (2016). Software Engineering (10th ed.). Harlow: Addison-Wesley.
MATH Google Scholar
Tjortjis, C., (2018) “Data Mining Code Clustering (DMCC): An approach supporting software maintenance and comprehension”, Technical report, School of Science & Technology, International Hellenic University, available at https://www.ihu.edu.gr/tjortjis/publications.htm. Accessed 8 October 2019
Tjortjis, C. and Layzell, P.J., (2001) “Expert maintainers’ strategies and needs when understanding software: a qualitative empirical study”, Proceedings IEEE 8th Asia-Pacific Software Engineering Conference. (APSEC 2001), IEEE Comparative Society Press, pp. 281-287.
Tjortjis, C., Gold, N., Layzell, P.J. and Bennett, K., (2002) “From system comprehension to program comprehension”, Proceedings. IEEE 26th Int’l Computer Software Applications Conference. (COMPSAC 02), IEEE Comparative . Society Press, pp. 427-432.
Tjortjis, C., Sinos, L. and Layzell, P.J., (2003)“Facilitating program comprehension by mining association rules from source code”, Proc. IEEE 11th Int’l Conf. Program Comprehension (ICPC 03), IEEE Comp. Soc. Press, pp. 125-132.
H. Tribus, I. Morrigl, S. Axelsson, (2012)“Using data mining for static code analysis of C”, Proc. 8th Int’l Conf. Advanced Data Mining and Applications (ADMA 2012), LNAI 7713, pp. 603-614.
Tzerpos, V. and Holt, R., (1998) “Software botryology: automatic clustering of software systems”, Proc. IEEE 9th Int’l Workshop Database Expert Systems Applications (DEXA98), pp. 811.
I. Witten, E. Frank, M. Hall and C. Pal, (2016) “Data mining: practical machine learning tools and techniques”, 4^th Ed., Morgan Kaufmann.
Yakhchi, S., Ghafari, S.M., Tjortjis, C., Fazeli, M., (2017) “ARMICA-improved: a new approach for association rule mining”, Proc. 10th Int’l Conf. on Knowledge Science, Engineering and Management (KSEM 17), Lecture Notes in Artificial Indigence, Springer-Verlag, vol. 10412, pp. 296-306.
Zhang, H. and Zhang, X., (2007) “Comments on ‘data mining static code attributes to learSeptn defect predictors”, IEEE Trans. Software Eng., pp. 635-636.
Zhang, D., Dang, Y., Lou, J.-G., Han, S., Zhang, H., and T. Xie, (2011) “Software analytics as a learning case in practice: approaches and experiences”, Proc. Int’l Workshop on Machine Learning Technologies in Software Engineering (MALETS 2011), pp. 55-58.

Download references

Author information

Authors and Affiliations

School of Science & Technology, International Hellenic University, 14th km Thessaloniki-Moudania, 57001, Thermi, Greece
Christos Tjortjis

Authors

Christos Tjortjis
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Christos Tjortjis.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Tjortjis, C. Mining Association Rules from Code (MARC) to support legacy software management. Software Qual J 28, 633–662 (2020). https://doi.org/10.1007/s11219-019-09480-3

Download citation

Published: 19 December 2019
Issue Date: June 2020
DOI: https://doi.org/10.1007/s11219-019-09480-3

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Mining Association Rules from Code (MARC) to support legacy software management

Abstract

Access this article

Similar content being viewed by others

Mining Patterns in Source Code Using Tree Mining Algorithms

Aggregating Association Rules to Improve Change Recommendation

Guided pattern mining for API misuse detection by change-based code analysis

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Mining Association Rules from Code (MARC) to support legacy software management

Abstract

Access this article

Similar content being viewed by others

Mining Patterns in Source Code Using Tree Mining Algorithms

Aggregating Association Rules to Improve Change Recommendation

Guided pattern mining for API misuse detection by change-based code analysis

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation