Skip to main content
Log in

Mining Association Rules from Code (MARC) to support legacy software management

  • Published:
Software Quality Journal Aims and scope Submit manuscript

Abstract

This paper presents a methodology for Mining Association Rules from Code (MARC), aiming at capturing program structure, facilitating system understanding and supporting software management. MARC groups program entities (paragraphs or statements) based on similarities, such as variable use, data types and procedure calls. It comprises three stages: code parsing/analysis, association rule mining and rule grouping. Code is parsed to populate a database with records and respective attributes. Association rules are then extracted from this database and subsequently processed to abstract programs into groups containing interrelated entities. Entities are then grouped together if their attributes participate to common rules. This abstraction is performed at the program level or even the paragraph level, in contrast to other approaches that work at the system level. Groups can then be visualised as collections of interrelated entities. The methodology was evaluated using real-life COBOL programs. Results showed that the methodology facilitates program comprehension by using source code only, where domain knowledge and documentation are either unavailable or unreliable.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

References

  • Agrawal, R. and Srikant, R., (1994), “Fast algorithms for mining association rules”, Proc. 20thInt’l Conf. Very Large Data Bases (VLDB 94), pp. 487-499.

  • Arshad S., Tjortjis, C., (2016) “Clustering software metric values extracted from C# code for maintainability assessment”, SETN 16, Article No. 24, ACM Int’l Conf. Proc. Series.

  • Brooks, R. (1983). Towards a theory of the comprehension of computer programs. International Journal of Man-Machine Studies, 18(6), 543–554.

    Article  Google Scholar 

  • Canfora, G., Cimitile, A., De Lucia, A., & Di Lucca, G. A. (2001). Decomposing legacy systems into objects: an eclectic approach. Information and Software Technology, 43, 401–412.

    Article  Google Scholar 

  • “COBOL blues”, Reuters (2017). http://fingfx.thomsonreuters.com/gfx/rngs/USA-BANKS-COBOL/010040KH18J/index.html Last access 30/11/18.

  • Dave, N., Potts, K., Dinh, V., & Asuncion, H. U. (2014). Combining association mining with topic modeling to discover more file relationships. International Journal on Advances in Software, 7(3 & 4), 539–550.

    Google Scholar 

  • De Oca, C.M. and Carver, D.L., (1998) “Identification of data cohesive subsystems using data mining techniques”, Proceedings International Conference Software Maintenance (ICSM 98), IEEE Comparative. Soc. Press, pp.16-23.

  • Deng, Z. H., & Lv, S. L. (2014). Fast mining frequent itemsets using nodesets. Expert Systems with Applications, 41(10), 4505–4512.

    Article  Google Scholar 

  • Eddy, B., (2014) “Structured source retrieval for improving software search during program comprehension tasks Proceedings ACM SIGPLAN Conference. Systems, Programming, and Applications: Software for Humanity (SPLASH '14), pp. 13-15.

  • Ghafari, S.M. and Tjortjis, C., (2016) “Association rules mining by improving the imperialism competitive algorithm (ARMICA)”, IFIP Advances in Information and Communication Technology, Proc. 12th Int'l Conf. on Artificial Intelligence Applications and Innovations (AIAI 2016), Vol. 475, Springer, pp 242-254.

  • Ghafari, S.M., Tjortjis, C., (2019) “A survey on association rules mining using heuristics”, WIREs Data Mining and Knowledge Discovery, Wiley, Vol. 9, no. 4

  • Kanellopoulos, Y., Makris, C., & Tjortjis, C. (2007). An improved methodology on information distillation by mining program source code. Data & Knowledge Engineering, 61(2), 359–383.

    Article  Google Scholar 

  • Khadka, R., Batlajery, B.V., Saeidi, A., Jansen, S., Hage, J., (2014) “How do professionals perceive legacy systems and software modernization?”, Proceedings36th International Conference Software Engineering (ICSE 14), pp. 36-47.

  • Kouris, I. N., Makris, C., and Tsakalidis, A., (2003) “An improved algorithm for mining association rules using multiple support values”, Proc. 16th Int’l Florida Artificial Intelligence Research Society Conf., (FLAIRS 03), pp. 309-314.

  • Kunz, T., & Black, J. P. (1995). Using automatic process clustering for design recovery and distributed debugging. IEEE Transactions on Software Engineering, 21(6), 515–527.

    Article  Google Scholar 

  • Lakhotia, A. (1997). A unified system for expressing software subsystem classification techniques. Journal of Systems and Software, 36(3), 211–231.

    Article  Google Scholar 

  • Letovsky, S., (1986) “Cognitive processes in program comprehension”, 1stWorkshop Empirical Studies of Programmers, pp 58-79.

  • Littman, D. C., Pinto, J., Letovsky, S., and Soloway, E., (1986) “Mental models and software maintenance”, 1stWorkshop Empirical Studies of Programmers, pp. 80-98.

  • Maqbool, O., Babri, H. A., Karim, A., & Sarwar, M. (2005). Metarule-guided association rule mining for program understanding. IEE Proceedings Software, 152(6), 281–296.

    Article  Google Scholar 

  • Menzies, T., Dekhtyar, A., Distefano, J., & Greenwald, J. (2007). Problems with precision: a response to comments on data mining static code attributes to learn defect predictors. IEEE Transactions on Software Engineering, 33(9), 1–4.

    Article  Google Scholar 

  • Misirli, A. T., Bener, A. B., & Turhan, B. (2011). An industrial case study of classifier ensembles for locating software defects. Software Quality Journal, 19(3), 515–536.

    Article  Google Scholar 

  • Mitchell, B. S., & Mancoridis, S. (2006). On the automatic modularization of software systems using the bunch tool. IEEE Transactions on Software Engineering, 32(3), 193–208.

    Article  Google Scholar 

  • Papas, D., & Tjortjis, C. (2014). Combining clustering and classification for software quality evaluation. Lecture Notes Computer Science, Springer-Verlag, 8445, 273–286.

    Article  Google Scholar 

  • Rousidis, D., Tjortjis, C., (2005) “Clustering data retrieved from java source code to support software maintenance: a case study”, Proc. IEEE 9th European Conf. Software Maintenance Reengineering (CSMR 05), IEEE Comp. Soc. Press, pp. 276-279.

  • Sartipi, K., Kontogiannis, K. and Mavaddat, F., (2000), “Architectural design recovery using data mining techniques”, Proc. 2nd European Working Conf. Software Maintenance Reengineering (CSMR 2000), IEEE Comp. Soc. Press, pp. 129-140.

  • Shtern, M., & Tzerpos, V. (2014). Methods for selecting and improving software clustering algorithms. Software Practice and Experience, 44(1), 33–46.

    Article  Google Scholar 

  • Silva, J. (2012). A vocabulary of program slicing-based techniques. ACM Computing Surveys (CSUR), 44(3), 12.

    Article  Google Scholar 

  • Sobernig, S., and Zdun, U., (2016) “Distilling architectural design decisions and their relationships using frequent item-sets”, 13th Working IEEE/IFIP Conf. on Software Architecture, pp. 61-70.

  • Soloway, E., & Ehrlich, K. (1984). Empirical studies of programming knowledge. IEEE Transactions on Software Engineering, 10(5), 595–609.

    Article  Google Scholar 

  • Sommerville, I. (2016). Software Engineering (10th ed.). Harlow: Addison-Wesley.

    MATH  Google Scholar 

  • Tjortjis, C., (2018) “Data Mining Code Clustering (DMCC): An approach supporting software maintenance and comprehension”, Technical report, School of Science & Technology, International Hellenic University, available at https://www.ihu.edu.gr/tjortjis/publications.htm. Accessed 8 October 2019

  • Tjortjis, C. and Layzell, P.J., (2001) “Expert maintainers’ strategies and needs when understanding software: a qualitative empirical study”, Proceedings IEEE 8th Asia-Pacific Software Engineering Conference. (APSEC 2001), IEEE Comparative Society Press, pp. 281-287.

  • Tjortjis, C., Gold, N., Layzell, P.J. and Bennett, K., (2002) “From system comprehension to program comprehension”, Proceedings. IEEE 26th Int’l Computer Software Applications Conference. (COMPSAC 02), IEEE Comparative . Society Press, pp. 427-432.

  • Tjortjis, C., Sinos, L. and Layzell, P.J., (2003)“Facilitating program comprehension by mining association rules from source code”, Proc. IEEE 11th Int’l Conf. Program Comprehension (ICPC 03), IEEE Comp. Soc. Press, pp. 125-132.

  • H. Tribus, I. Morrigl, S. Axelsson, (2012)“Using data mining for static code analysis of C”, Proc. 8th Int’l Conf. Advanced Data Mining and Applications (ADMA 2012), LNAI 7713, pp. 603-614.

  • Tzerpos, V. and Holt, R., (1998) “Software botryology: automatic clustering of software systems”, Proc. IEEE 9th Int’l Workshop Database Expert Systems Applications (DEXA98), pp. 811.

  • I. Witten, E. Frank, M. Hall and C. Pal, (2016) “Data mining: practical machine learning tools and techniques”, 4th Ed., Morgan Kaufmann.

  • Yakhchi, S., Ghafari, S.M., Tjortjis, C., Fazeli, M., (2017) “ARMICA-improved: a new approach for association rule mining”, Proc. 10th Int’l Conf. on Knowledge Science, Engineering and Management (KSEM 17), Lecture Notes in Artificial Indigence, Springer-Verlag, vol. 10412, pp. 296-306.

  • Zhang, H. and Zhang, X., (2007) “Comments on ‘data mining static code attributes to learSeptn defect predictors”, IEEE Trans. Software Eng., pp. 635-636.

  • Zhang, D., Dang, Y., Lou, J.-G., Han, S., Zhang, H., and T. Xie, (2011) “Software analytics as a learning case in practice: approaches and experiences”, Proc. Int’l Workshop on Machine Learning Technologies in Software Engineering (MALETS 2011), pp. 55-58.

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Christos Tjortjis.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Tjortjis, C. Mining Association Rules from Code (MARC) to support legacy software management. Software Qual J 28, 633–662 (2020). https://doi.org/10.1007/s11219-019-09480-3

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11219-019-09480-3

Keywords

Navigation