Skip to main content
Log in

Reconstructing and evolving software architectures using a coordinated clustering framework

  • Published:
Automated Software Engineering Aims and scope Submit manuscript

Abstract

During a long maintenance period, software projects experience architectural erosion and drift, making maintenance tasks more challenging to perform for software engineers unfamiliar with the code base. This paper presents a framework that assists software engineers in recovering a software project’s architecture from its source code. The architectural recovery process is an iterative one that combines clustering based on contextual and structural information in the code base with incremental developer feedback. This process converges when the developer is satisfied with the proposed decomposition of the software, and, as an additional benefit, the framework becomes tuned to aid future evolution of the project. The paper provides both analytic and empirical evaluations of the obtained results; experimental results show a reasonably superior performance of our framework over alternative conventional methods. The proposed framework utilizes a novel compartmentalization technique Coordinated Clustering of Heterogeneous Datasets (CCHD) that relies on contextual and structural information in the code base, but, unlike most previous approaches, does not require specific weights for each information type, which allows it to adapt to different project types and domains.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

Notes

  1. In this paper, we use class to refer to the programming language context of the word, rather than to a collection or category.

  2. https://github.com/abb-iss/SrcML.NET.

  3. http://vcu-swim-lab.github.io/cchd.

  4. http://vcu-swim-lab.github.io/cchd.

  5. http://sando.codeplex.com.

References

  • Andritsos, P., Tzerpos, V.: Information-theoretic software clustering. IEEE Trans. Softw. Eng. 31(2), 150–165 (2005)

    Article  Google Scholar 

  • Bae, E., Bailey, J.: Coala: a novel approach for the extraction of an alternate clustering of high quality and high dissimilarity. In: Proceedings of the Sixth International Conference on Data Mining (ICDM’06), IEEE, pp 53–62 (2006)

  • Banerjee, A., Dhillon, I., Ghosh, J., Merugu, S., Modha, D.: A generalized maximum entropy approach to Bregman co-clustering and matrix approximation. In: Proceedings of the 10th International Conference on Knowledge Discovery and Data Mining (KDD’04), pp. 509–514 (2004)

  • Basu, S., Davidson, I., Wagstaff, K.: Constrained Clustering: Advances in Algorithms, Theory, and Applications. CRC Press, Boca Raton (2008)

    MATH  Google Scholar 

  • Bauer, M., Trifu, M.: Architecture-aware Adaptive Clustering of OO Systems. In: Proceedings of the 8th European Conference on Software Maintenance and Reengineering (CSMR’04), pp. 3–14 (2004)

  • Bavota, G., Carnevale, F., Lucia, A., Penta, M., Oliveto, R.: Putting the developer in-the-loop: an interactive GA for software re-modularization. In: Proceedings of the 4th International Symposium on Search Based Software Engineering (SSBSE’12), pp. 75–89 (2012)

  • Bavota, G., Lucia, A., Marcus, A., Oliveto, R.: Using structural and semantic measures to improve software modularization. Empir. Softw. Eng. 18(5), 901–932 (2013)

    Article  Google Scholar 

  • Berkopec, A.: HyperQuick algorithm for discrete hypergeometric distribution. J. Discrete Algorithms 5(2), 341–347 (2007)

    Article  MathSciNet  MATH  Google Scholar 

  • Böhm, C., Faloutsos, C., Pan, J., Plant, C.: Robust information-theoretic clustering. In: Proceedings of the 12th International Conference on Knowledge Discovery and Data Mining (KDD’06), pp. 65–75 (2006)

  • Cai, Y., Iannuzzi, D., Wong, S.: Leveraging design structure matrices in software design education. In: Proceedings of the 24th IEEE-CS Conference on Software Engineering Education and Training (CSEET’11). IEEE, pp. 179–188 (2011)

  • Cai, Y., Wang, H., Wong, S., Wang, L.: Leveraging design rules to improve software architecture recovery. In: Proceedings of the 9th International ACM Sigsoft Conference on Quality of Software Architectures, ACM, New York, NY, USA, QoSA’13, pp. 133–142. doi:10.1145/2465478.2465480 (2013)

  • Chaitin, G.: Algorithmic Information Theory. Wiley Online Library, New York (1982)

    MATH  Google Scholar 

  • Christl, A., Koschke, R., Storey, M.: Equipping the reflexion method with automated clustering. In: 12th Working Conference on Reverse Engineering. IEEE, pp. 10–20 (2005)

  • Corazza, A., Di Martino, S., Scanniello, G.: A probabilistic based approach towards software system clustering. In: 2010 14th European Conference on Software Maintenance and Reengineering (CSMR). IEEE, pp. 88–96 (2010)

  • Corazza, A., Di Martino, S., Maggio, V., Scanniello, G.: Weighing lexical information for software clustering in the context of architecture recovery. Empir. Softw. Eng. 21(1), 72–103 (2016)

  • Cressie, N.: Statistics for Spatial Data, vol. 900. Wiley, New York (1993)

    MATH  Google Scholar 

  • Dai, W., Xue, G., Yang, Q., Yu, Y.: Co-clustering based classification for out-of-domain documents. In: Proceedings of the 13th International Conference on Knowledge Discovery and Data Mining (KDD’07), pp. 210–219 (2007)

  • Dhillon, I.: Co-clustering documents and words using bipartite spectral graph partitioning. In: Proceedings of the 7th International Conference on Knowledge Discovery and Data Mining (KDD’01), pp. 269–274 (2001)

  • Dhillon, I., Guan, Y.: Information theoretic clustering of sparse cooccurrence data. In: Proceedings of the 3rd International Conference on Data Mining (ICDM’03), pp. 517–520 (2003)

  • Dhillon, I., Mallela, S., Modha, D.: Information-theoretic co-clustering. In: Proceedings of the 9th International Conference on Knowledge Discovery and Data Mining (KDD’03), pp. 89–98 (2003)

  • Dunn, J.: A fuzzy relative of the ISODATA process and its use in detecting compact well-separated clusters. J. Cybern. (1973)

  • Gao, B., Liu, T., Zheng, X., Cheng, Q., Ma, W.: Consistent bipartite graph co-partitioning for star-structured high-order heterogeneous data co-clustering. In: Proceedings of the 11th International Conference on Knowledge Discovery in Data Mining (KDD’05), pp. 41–50 (2005)

  • Garcia, J., Popescu, D., Mattmann, C., Medvidovic, N., Cai, Y.: Enhancing architectural recovery using concerns. In: Proceedings of the 2011 26th IEEE/ACM International Conference on Automated Software Engineering. IEEE Computer Society, pp. 552–555 (2011)

  • Garcia, J., Ivkovic, I., Medvidovic, N.: A comparative analysis of software architecture recovery techniques. In: Proceedings of the 28th International Conference on Automated Software Engineering (ICASE’13), pp. 486–496 (2013a)

  • Garcia, J., Krka, I., Mattmann, C., Medvidovic, N.: Obtaining ground-truth software architectures. In: Proceedings of the 2013 International Conference on Software Engineering. IEEE Press, pp. 901–910 (2013b)

  • Gokcay, E., Principe, J.: Information theoretic clustering. Pattern Anal. Mach. Intell. 24(2), 158–171 (2002)

    Article  Google Scholar 

  • Hossain, M.S., Tadepalli, S., Watson, L., Davidson, I., Helm, R., Ramakrishnan, N.: Unifying dependent clustering and disparate clustering for non-homogeneous data. In: Proceedings of the 16th International Conference on Knowledge Discovery and Data Mining (KDD’10), pp. 593–602 (2010)

  • Hossain, M.S., Gresock, J., Edmonds, Y., Helm, R., Potts, M., Ramakrishnan, N.: Connecting the dots between pubmed abstracts. PLoS ONE 7(1), e29,509 (2012)

    Article  Google Scholar 

  • Hossain, M.S., Marwah, M., Shah, A., Watson, L., Ramakrishnan, N.: AutoLCA: a framework for sustainable redesign and assessment of products. ACM Trans. Intell. Syst. Technol. 5(2) (2014)

  • Koschke, R.: Atomic architectural component recovery for program understanding and evolution. In: IEEE International Conference on Software Maintenance. IEEE Computer Society, pp. 478–488 (2002)

  • Lutellier, T., Chollak, D., Garcia, J., Tan, L., Rayside, D., Medvidovic, N., Kroeger, R.: Comparing software architecture recovery techniques using accurate dependencies. In: 2015 IEEE/ACM 37th IEEE International Conference on Software Engineering (ICSE). IEEE, vol. 2, pp. 69–78 (2015)

  • Luxburg, U.: A tutorial on spectral clustering. Stat. Comput. 17(4), 395–416 (2007)

    Article  MathSciNet  Google Scholar 

  • Mancoridis, S., Mitchell, B.S., Chen, Y., Gansner, E.R.: Bunch: a clustering tool for the recovery and maintenance of software system structures. In: IEEE International Conference on Software Maintenance, 1999 (ICSM’99). Proceedings. IEEE, pp. 50–59 (1999)

  • Manning, C., Raghavan, P., Schütze, H.: Introduction to Information Retrieval, vol. 1. Cambridge University Press, Cambridge (2008)

    Book  MATH  Google Scholar 

  • Maqbool, O., Babri, H.A.: The weighted combined algorithm: a linkage algorithm for software clustering. In: Eighth European Conference on Software Maintenance and Reengineering, 2004. CSMR 2004. Proceedings. IEEE, pp. 15–24 (2004)

  • Mises, R., Pollaczek-Geiringer, H.: Praktische verfahren der gleichungsauflösung. ZAMM 9(1), 58–77 (1929)

    Article  MATH  Google Scholar 

  • Misra, J., Annervaz, K., Kaulgud, V., Sengupta, S., Titus, G.: Software Clustering: Unifying Syntactic and Semantic Features. Working Conference on Reverse Engineering, pp. 113–122 (2012)

  • Mohar, B.: Some Applications of Laplace Eigenvalues of Graphs. Springer, Berlin (1997)

    Book  MATH  Google Scholar 

  • Mohar, B., Alavi, Y.: The Laplacian Spectrum of Graphs. Graph Theory Comb. Appl. 2, 871–898 (1991)

    MathSciNet  MATH  Google Scholar 

  • Momtazpour, M., Butler, P., Hossain, M.S., Bozchalui, M., Ramakrishnan, N., Sharma, R.: Coordinated clustering algorithms to support charging infrastructure design for electric vehicles. In: Proceedings of the 18th International Conference on Knowledge Discovery and Data Mining (KDD UrbComp’12), pp. 126–133 (2012)

  • Na, S., Xumin, L., Yong, G.: Research on k-means clustering algorithm: an improved k-means clustering algorithm. In: In Proceedings of the 3rd International Symposium on Intelligent Information Technology and Security Informatics (IITSI’10). IEEE, pp. 63–67 (2010)

  • Ng, A., Jordan, M., Weiss, Y.: On spectral clustering: analysis and an algorithm. Adv. Neural Inf. Process. Syst. 2, 849–856 (2002)

    Google Scholar 

  • Pohlhausen, E.: Berechnung der eigenschwingungen statisch-bestimmter fachwerke. ZAMM 1(1), 28–42 (1921)

    Article  MATH  Google Scholar 

  • Praditwong, K., Harman, M., Yao, X.: Software module clustering as a multi-objective search problem. IEEE Trans. Softw. Eng. 37(2), 264–282 (2011)

    Article  Google Scholar 

  • Scanniello, G., Marcus, A.: Clustering support for static concept location in source code. In: Proceedings of the 19th International Conference on Program Comprehension (ICPC’11), pp. 1–10 (2011)

  • Shi, J., Malik, J.: Normalized cuts and image segmentation. Pattern Anal. Mach. Intell. 22(8), 888–905 (2000)

    Article  Google Scholar 

  • Shtern, M., Tzerpos, V.: Clustering methodologies for software engineering. Adv. Softw. Eng. (2012). doi:10.1155/2012/792024

  • Struyf, A., Hubert, M., Rousseeuw, P.: Clustering in an object-oriented environment. J. Stat. Softw. 1(4), 1–30 (1997)

    Google Scholar 

  • Taylor, R.N., Medvidovic, N., Dashofy, E.M.: Software Architecture: Foundations, Theory, and Practice. Wiley, New York (2009)

    Book  Google Scholar 

  • Tzerpos, V., Holt, R.C.: Acdc: an algorithm for comprehension-driven clustering. In: 2013 20th Working Conference on Reverse Engineering (WCRE). IEEE Computer Society, pp. 258–258 (2000)

  • Wen, Z., Tzerpos, V.: An effectiveness measure for software clustering algorithms. In: 12th IEEE International Workshop on Program Comprehension, 2004. Proceedings. IEEE, pp. 194–203 (2004)

  • Yang, C., Zhou, J.: HClustream: a novel approach for clustering evolving heterogeneous data stream. In: Proceedings of the 6th International Conference on Data Mining (ICDM’03), pp. 682–688 (2006)

  • Yoon, H., Ahn, S., Lee, S., Cho, S., Kim, J.: Heterogeneous clustering ensemble method for combining different cluster results. Data Min. Biomed. Appl. 3916, 82–92 (2006)

    Article  Google Scholar 

  • Yue, J., Clayton, M.: A similarity measure based on species proportions. Commun. Stat. Theory Methods 34(11), 2123–2131 (2005)

    Article  MathSciNet  MATH  Google Scholar 

  • Zheng, F., Webb, G.I.: A comparative study of semi-naive Bayes methods in classification learning. In: Proceedings of the Fourth Australasian Data Mining Conference (AusDM05), Citeseer, pp. 141–156 (2005)

  • Zhu, J., Huang, J., Zhou, D., Yin, Z., Zhang, G., He, Q.: Software architecture recovery through similarity-based graph clustering. Int. J. Softw. Eng. Knowl. Eng. 23(04), 559–586 (2013)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Sheikh Motahar Naim.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Naim, S.M., Damevski, K. & Hossain, M.S. Reconstructing and evolving software architectures using a coordinated clustering framework. Autom Softw Eng 24, 543–572 (2017). https://doi.org/10.1007/s10515-017-0211-8

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10515-017-0211-8

Keywords

Navigation