Abstract
Identification of groups of co-expressed or co-regulated genes is critical for exploring the underlying mechanism behind a particular disease like cancer. Condition-specific (disease-specific) gene-expression profiles acquired from different platforms are widely utilized by researchers to get insight into the regulatory mechanism of the disease. Several clustering algorithms are developed using gene expression profiles to identify the group of similar genes. These algorithms are computationally efficient but are not able to capture the functional similarity present between the genes, which is very important from a biological perspective. In this study, an algorithm named CorGO is introduced, that specifically deals with the identification of functionally similar gene-clusters. Two types of relationships are calculated for this purpose. Firstly, the Correlation (Cor) between the genes are captured from the gene-expression data, which helps in deciphering the relationship between genes based on its expression across several diseased samples. Secondly, Gene Ontology (GO)-based semantic similarity information available for the genes is utilized, that helps in adding up biological relevance to the identified gene-clusters. A similarity measure is defined by integrating these two components that help in the identification of homogeneous and functionally similar groups of genes. CorGO is applied to four different types of gene expression profiles of different types of cancer. Gene-clusters identified by CorGO, are further validated by pathway enrichment, disease enrichment, and network analysis. These biological analyses demonstrated significant connectivity and functional relatedness within the genes of the same cluster. A comparative study with commonly used clustering algorithms is also performed to show the efficacy of the proposed method.
Similar content being viewed by others
References
Reddy CK, Hasan MA, Zaki MJ (2013) Clustering biological data. Data clustering: algorithms and applications. Chapman and Hall/CRC, London, pp 381–414
Sharan R, Elkon R, Shamir R (2002) Cluster analysis and its applications to gene expression data. Ernst schering workshop on bioinformatics and genome analysis. Springer, Berlin. https://doi.org/10.1007/978-3-662-04747-7_5
Wang J, Li M, Chen J, Pan Y (2011) A fast hierarchical clustering algorithm for functional modules discovery in protein interaction networks. IEEE/ACM Trans Comput Biol Bioinform 8(3):607–620. https://doi.org/10.1109/TCBB.2010.75
Malhat MG, Mousa HM, El-Sisi AB (2014) Clustering of chemical data sets for drug discovery. Int Conf Inform Syst. https://doi.org/10.1109/INFOS.2014.70367
Bezdek James C (1981) Pattern recognition with fuzzy objective function algorithms. Kluwer Academic Publishers, Norwell
Maji P, Paul S (2013) Rough-fuzzy clustering for grouping functionally similar genes from microarray data. IEEE/ACM Trans Comput Biol Bioinform 10(2):286–299. https://doi.org/10.1109/TCBB.2012.103
Tamayo P, Slonim D, Mesirov J, Zhu Q, Kitareewan S, Dmitrovsky E, Lander ES, Golub TR (1999) Interpreting patterns of gene expression with self-organizing maps: methods and application to hematopoietic differentiation. Proc Natl Acad Sci 96(6):2907–2912. https://doi.org/10.1073/pnas.96.6.2907
Johnson CS (1967) Hierarchical clustering schemes. Psychometrika 32(3):241–254. https://doi.org/10.1007/BF02289588
Ester M, Kriegel HP, Sander J, Xu X (1996) A density-based algorithm for discovering clusters in large spatial databases with noise. Proc Second Int Conf Knowl Discov Data Min. https://doi.org/10.5555/3001460.3001507
Network TCGAR (2018) The cancer genome atlas pan-cancer analysis project. Nat Genet. https://doi.org/10.1038/ng.2764
Li F, Yu G, Wang S, Bo X, Wu Y, Qin Y (2010) GOSemSim: an R package for measuring semantic similarity among GO terms and gene products. Bioinformatics 26(7):976–978. https://doi.org/10.1093/bioinformatics/btq064
Chen CF, Wang JZ, Yu PS, Payattakool R, Du Z (2007) A new method to measure the semantic similarity of GO terms. Bioinformatics 23(10):1274–1281. https://doi.org/10.1093/bioinformatics/btm087
Yu G, Wang LG, Han Y, He QY (2012) ClusterProfiler: an R package for comparing biological themes among gene clusters OMICS. J Integr Biol 16(5):284–287. https://doi.org/10.1089/omi.2011.0118
Singh M, Paul S (2020) A feature weighting-assisted approach for cancer subtypes identification from paired expression profiles. IEEE/ACM Trans Comput Biol Bioinform. https://doi.org/10.1109/TCBB.2020.3041723
Yu G, Yan GR, Wang LG, He QY (2014) DOSE: an R/Bioconductor Package for disease ontology semantic and enrichment analysis. Bioinformatics 31(4):608–609. https://doi.org/10.1093/bioinformatics/btu684
Szklarczyk D, Morris JH, Cook H, Kuhn M, Wyder S, Simonovic M, Santos A, Doncheva NT, Roth A, Bork P, Jensen LJ, von Mering C (2017) The STRING database in 2017: quality-controlled protein–protein association networks, made broadly accessible. Nucleic Acids Res 45:D362–D368. https://doi.org/10.1093/nar/gkw937
Gamboa RA, Gomez-Rueda H, Martínez-Ledesma E, Martínez-Torteya A, Chacolla-Huaringa R, Rodriguez-Barrientos A, Tamez-Pena JG, Trevino V (2013) SurvExpress: an online biomarker validation tool and database for cancer gene expression data using survival analysis. PLoS One. https://doi.org/10.1371/journal.pone.0074250
Kustra R, Zagdanski A (2006) Incorporating gene ontology in clustering gene expression data. IEEE Symp Comput-Based Med Syst. https://doi.org/10.1109/CBMS.2006.100
Hanisch D, Zien A, Zimmer R, Lengauer T (2002) Co-clustering of biological networks and gene expression data. Bioinformatics 18:S145-54. https://doi.org/10.1093/bioinformatics/18.suppl_1.s145
Adryan B, Schuh R (2004) Gene-ontology-based clustering of gene expression data. Bioinformatics 20(16):2851–2852. https://doi.org/10.1093/bioinformatics/bth289
Ovaska K, Laakso M, Hautaniemi S (2008) Fast gene ontology based clustering for microarray experiments. Bio Data Min 1(1):1–11. https://doi.org/10.1186/1756-0381-1-11
Wang H, Azuaje F, Bodenreider O, Dopazo J (2004) Gene expression correlation and gene ontology-based similarity: an assessment of quantitative relationships. Symp Comput Intell Bioinform Comput Biol. https://doi.org/10.1109/CIBCB.2004.1393927
Azuaje F, Bodenreider O (2004) Incorporating ontology-driven similarity knowledge into functional genomics: an exploratory study. Proc Fourth IEEE Symp Bioinform Bioeng. https://doi.org/10.1109/BIBE.2004.1317360
Pilpel Y, Sudarsanam P, Church G (2001) Identifying regulatory networks by combinatorial analysis of promoter elements. Nat Genet 29:153–159. https://doi.org/10.1109/doi.org/10.1038/ng724
Acknowledgements
This work is partially supported by Department of Science and Technology, Government of India, New Delhi, Grant number-ECR/2016/001917.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no conflict of interest.
Rights and permissions
About this article
Cite this article
Pant, N., Madhumita, M. & Paul, S. CorGO: An Integrated Method for Clustering Functionally Similar Genes. Interdiscip Sci Comput Life Sci 13, 624–637 (2021). https://doi.org/10.1007/s12539-021-00424-9
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s12539-021-00424-9