Skip to main content
Log in

CorGO: An Integrated Method for Clustering Functionally Similar Genes

  • Original research article
  • Published:
Interdisciplinary Sciences: Computational Life Sciences Aims and scope Submit manuscript

Abstract

Identification of groups of co-expressed or co-regulated genes is critical for exploring the underlying mechanism behind a particular disease like cancer. Condition-specific (disease-specific) gene-expression profiles acquired from different platforms are widely utilized by researchers to get insight into the regulatory mechanism of the disease. Several clustering algorithms are developed using gene expression profiles to identify the group of similar genes. These algorithms are computationally efficient but are not able to capture the functional similarity present between the genes, which is very important from a biological perspective. In this study, an algorithm named CorGO is introduced, that specifically deals with the identification of functionally similar gene-clusters. Two types of relationships are calculated for this purpose. Firstly, the Correlation (Cor) between the genes are captured from the gene-expression data, which helps in deciphering the relationship between genes based on its expression across several diseased samples. Secondly, Gene Ontology (GO)-based semantic similarity information available for the genes is utilized, that helps in adding up biological relevance to the identified gene-clusters. A similarity measure is defined by integrating these two components that help in the identification of homogeneous and functionally similar groups of genes. CorGO is applied to four different types of gene expression profiles of different types of cancer. Gene-clusters identified by CorGO, are further validated by pathway enrichment, disease enrichment, and network analysis. These biological analyses demonstrated significant connectivity and functional relatedness within the genes of the same cluster. A comparative study with commonly used clustering algorithms is also performed to show the efficacy of the proposed method.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

References

  1. Reddy CK, Hasan MA, Zaki MJ (2013) Clustering biological data. Data clustering: algorithms and applications. Chapman and Hall/CRC, London, pp 381–414

    Google Scholar 

  2. Sharan R, Elkon R, Shamir R (2002) Cluster analysis and its applications to gene expression data. Ernst schering workshop on bioinformatics and genome analysis. Springer, Berlin. https://doi.org/10.1007/978-3-662-04747-7_5

    Book  Google Scholar 

  3. Wang J, Li M, Chen J, Pan Y (2011) A fast hierarchical clustering algorithm for functional modules discovery in protein interaction networks. IEEE/ACM Trans Comput Biol Bioinform 8(3):607–620. https://doi.org/10.1109/TCBB.2010.75

    Article  PubMed  Google Scholar 

  4. Malhat MG, Mousa HM, El-Sisi AB (2014) Clustering of chemical data sets for drug discovery. Int Conf Inform Syst. https://doi.org/10.1109/INFOS.2014.70367

    Article  Google Scholar 

  5. Bezdek James C (1981) Pattern recognition with fuzzy objective function algorithms. Kluwer Academic Publishers, Norwell

    Book  Google Scholar 

  6. Maji P, Paul S (2013) Rough-fuzzy clustering for grouping functionally similar genes from microarray data. IEEE/ACM Trans Comput Biol Bioinform 10(2):286–299. https://doi.org/10.1109/TCBB.2012.103

    Article  PubMed  Google Scholar 

  7. Tamayo P, Slonim D, Mesirov J, Zhu Q, Kitareewan S, Dmitrovsky E, Lander ES, Golub TR (1999) Interpreting patterns of gene expression with self-organizing maps: methods and application to hematopoietic differentiation. Proc Natl Acad Sci 96(6):2907–2912. https://doi.org/10.1073/pnas.96.6.2907

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  8. Johnson CS (1967) Hierarchical clustering schemes. Psychometrika 32(3):241–254. https://doi.org/10.1007/BF02289588

    Article  CAS  PubMed  Google Scholar 

  9. Ester M, Kriegel HP, Sander J, Xu X (1996) A density-based algorithm for discovering clusters in large spatial databases with noise. Proc Second Int Conf Knowl Discov Data Min. https://doi.org/10.5555/3001460.3001507

    Article  Google Scholar 

  10. Network TCGAR (2018) The cancer genome atlas pan-cancer analysis project. Nat Genet. https://doi.org/10.1038/ng.2764

    Article  Google Scholar 

  11. Li F, Yu G, Wang S, Bo X, Wu Y, Qin Y (2010) GOSemSim: an R package for measuring semantic similarity among GO terms and gene products. Bioinformatics 26(7):976–978. https://doi.org/10.1093/bioinformatics/btq064

    Article  CAS  PubMed  Google Scholar 

  12. Chen CF, Wang JZ, Yu PS, Payattakool R, Du Z (2007) A new method to measure the semantic similarity of GO terms. Bioinformatics 23(10):1274–1281. https://doi.org/10.1093/bioinformatics/btm087

    Article  CAS  PubMed  Google Scholar 

  13. Yu G, Wang LG, Han Y, He QY (2012) ClusterProfiler: an R package for comparing biological themes among gene clusters OMICS. J Integr Biol 16(5):284–287. https://doi.org/10.1089/omi.2011.0118

    Article  CAS  Google Scholar 

  14. Singh M, Paul S (2020) A feature weighting-assisted approach for cancer subtypes identification from paired expression profiles. IEEE/ACM Trans Comput Biol Bioinform. https://doi.org/10.1109/TCBB.2020.3041723

    Article  Google Scholar 

  15. Yu G, Yan GR, Wang LG, He QY (2014) DOSE: an R/Bioconductor Package for disease ontology semantic and enrichment analysis. Bioinformatics 31(4):608–609. https://doi.org/10.1093/bioinformatics/btu684

    Article  CAS  PubMed  Google Scholar 

  16. Szklarczyk D, Morris JH, Cook H, Kuhn M, Wyder S, Simonovic M, Santos A, Doncheva NT, Roth A, Bork P, Jensen LJ, von Mering C (2017) The STRING database in 2017: quality-controlled protein–protein association networks, made broadly accessible. Nucleic Acids Res 45:D362–D368. https://doi.org/10.1093/nar/gkw937

    Article  CAS  PubMed  Google Scholar 

  17. Gamboa RA, Gomez-Rueda H, Martínez-Ledesma E, Martínez-Torteya A, Chacolla-Huaringa R, Rodriguez-Barrientos A, Tamez-Pena JG, Trevino V (2013) SurvExpress: an online biomarker validation tool and database for cancer gene expression data using survival analysis. PLoS One. https://doi.org/10.1371/journal.pone.0074250

    Article  PubMed  PubMed Central  Google Scholar 

  18. Kustra R, Zagdanski A (2006) Incorporating gene ontology in clustering gene expression data. IEEE Symp Comput-Based Med Syst. https://doi.org/10.1109/CBMS.2006.100

    Article  Google Scholar 

  19. Hanisch D, Zien A, Zimmer R, Lengauer T (2002) Co-clustering of biological networks and gene expression data. Bioinformatics 18:S145-54. https://doi.org/10.1093/bioinformatics/18.suppl_1.s145

    Article  PubMed  Google Scholar 

  20. Adryan B, Schuh R (2004) Gene-ontology-based clustering of gene expression data. Bioinformatics 20(16):2851–2852. https://doi.org/10.1093/bioinformatics/bth289

    Article  CAS  PubMed  Google Scholar 

  21. Ovaska K, Laakso M, Hautaniemi S (2008) Fast gene ontology based clustering for microarray experiments. Bio Data Min 1(1):1–11. https://doi.org/10.1186/1756-0381-1-11

    Article  CAS  Google Scholar 

  22. Wang H, Azuaje F, Bodenreider O, Dopazo J (2004) Gene expression correlation and gene ontology-based similarity: an assessment of quantitative relationships. Symp Comput Intell Bioinform Comput Biol. https://doi.org/10.1109/CIBCB.2004.1393927

    Article  Google Scholar 

  23. Azuaje F, Bodenreider O (2004) Incorporating ontology-driven similarity knowledge into functional genomics: an exploratory study. Proc Fourth IEEE Symp Bioinform Bioeng. https://doi.org/10.1109/BIBE.2004.1317360

    Article  Google Scholar 

  24. Pilpel Y, Sudarsanam P, Church G (2001) Identifying regulatory networks by combinatorial analysis of promoter elements. Nat Genet 29:153–159. https://doi.org/10.1109/doi.org/10.1038/ng724

    Article  CAS  PubMed  Google Scholar 

Download references

Acknowledgements

This work is partially supported by Department of Science and Technology, Government of India, New Delhi, Grant number-ECR/2016/001917.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Sushmita Paul.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Pant, N., Madhumita, M. & Paul, S. CorGO: An Integrated Method for Clustering Functionally Similar Genes. Interdiscip Sci Comput Life Sci 13, 624–637 (2021). https://doi.org/10.1007/s12539-021-00424-9

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s12539-021-00424-9

Keywords

Navigation