CorGO: An Integrated Method for Clustering Functionally Similar Genes

Pant, Namrata; Madhumita, Madhumita; Paul, Sushmita

doi:10.1007/s12539-021-00424-9

CorGO: An Integrated Method for Clustering Functionally Similar Genes

Original research article
Published: 24 March 2021

Volume 13, pages 624–637, (2021)
Cite this article

Interdisciplinary Sciences: Computational Life Sciences Aims and scope Submit manuscript

Namrata Pant¹^na1,
Madhumita Madhumita¹ &
Sushmita Paul¹

463 Accesses
1 Altmetric
Explore all metrics

Abstract

Identification of groups of co-expressed or co-regulated genes is critical for exploring the underlying mechanism behind a particular disease like cancer. Condition-specific (disease-specific) gene-expression profiles acquired from different platforms are widely utilized by researchers to get insight into the regulatory mechanism of the disease. Several clustering algorithms are developed using gene expression profiles to identify the group of similar genes. These algorithms are computationally efficient but are not able to capture the functional similarity present between the genes, which is very important from a biological perspective. In this study, an algorithm named CorGO is introduced, that specifically deals with the identification of functionally similar gene-clusters. Two types of relationships are calculated for this purpose. Firstly, the Correlation (Cor) between the genes are captured from the gene-expression data, which helps in deciphering the relationship between genes based on its expression across several diseased samples. Secondly, Gene Ontology (GO)-based semantic similarity information available for the genes is utilized, that helps in adding up biological relevance to the identified gene-clusters. A similarity measure is defined by integrating these two components that help in the identification of homogeneous and functionally similar groups of genes. CorGO is applied to four different types of gene expression profiles of different types of cancer. Gene-clusters identified by CorGO, are further validated by pathway enrichment, disease enrichment, and network analysis. These biological analyses demonstrated significant connectivity and functional relatedness within the genes of the same cluster. A comparative study with commonly used clustering algorithms is also performed to show the efficacy of the proposed method.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Computational Techniques and Tools for Omics Data Analysis: State-of-the-Art, Challenges, and Future Directions

Article 01 February 2021

The GeneCards Suite

Identification of a Group of Therapeutic Targets and Prognostic Biomarker for Triple Negative Breast Cancer

Article 29 February 2024

References

Reddy CK, Hasan MA, Zaki MJ (2013) Clustering biological data. Data clustering: algorithms and applications. Chapman and Hall/CRC, London, pp 381–414
Google Scholar
Sharan R, Elkon R, Shamir R (2002) Cluster analysis and its applications to gene expression data. Ernst schering workshop on bioinformatics and genome analysis. Springer, Berlin. https://doi.org/10.1007/978-3-662-04747-7_5
Book Google Scholar
Wang J, Li M, Chen J, Pan Y (2011) A fast hierarchical clustering algorithm for functional modules discovery in protein interaction networks. IEEE/ACM Trans Comput Biol Bioinform 8(3):607–620. https://doi.org/10.1109/TCBB.2010.75
Article PubMed Google Scholar
Malhat MG, Mousa HM, El-Sisi AB (2014) Clustering of chemical data sets for drug discovery. Int Conf Inform Syst. https://doi.org/10.1109/INFOS.2014.70367
Article Google Scholar
Bezdek James C (1981) Pattern recognition with fuzzy objective function algorithms. Kluwer Academic Publishers, Norwell
Book Google Scholar
Maji P, Paul S (2013) Rough-fuzzy clustering for grouping functionally similar genes from microarray data. IEEE/ACM Trans Comput Biol Bioinform 10(2):286–299. https://doi.org/10.1109/TCBB.2012.103
Article PubMed Google Scholar
Tamayo P, Slonim D, Mesirov J, Zhu Q, Kitareewan S, Dmitrovsky E, Lander ES, Golub TR (1999) Interpreting patterns of gene expression with self-organizing maps: methods and application to hematopoietic differentiation. Proc Natl Acad Sci 96(6):2907–2912. https://doi.org/10.1073/pnas.96.6.2907
Article CAS PubMed PubMed Central Google Scholar
Johnson CS (1967) Hierarchical clustering schemes. Psychometrika 32(3):241–254. https://doi.org/10.1007/BF02289588
Article CAS PubMed Google Scholar
Ester M, Kriegel HP, Sander J, Xu X (1996) A density-based algorithm for discovering clusters in large spatial databases with noise. Proc Second Int Conf Knowl Discov Data Min. https://doi.org/10.5555/3001460.3001507
Article Google Scholar
Network TCGAR (2018) The cancer genome atlas pan-cancer analysis project. Nat Genet. https://doi.org/10.1038/ng.2764
Article Google Scholar
Li F, Yu G, Wang S, Bo X, Wu Y, Qin Y (2010) GOSemSim: an R package for measuring semantic similarity among GO terms and gene products. Bioinformatics 26(7):976–978. https://doi.org/10.1093/bioinformatics/btq064
Article CAS PubMed Google Scholar
Chen CF, Wang JZ, Yu PS, Payattakool R, Du Z (2007) A new method to measure the semantic similarity of GO terms. Bioinformatics 23(10):1274–1281. https://doi.org/10.1093/bioinformatics/btm087
Article CAS PubMed Google Scholar
Yu G, Wang LG, Han Y, He QY (2012) ClusterProfiler: an R package for comparing biological themes among gene clusters OMICS. J Integr Biol 16(5):284–287. https://doi.org/10.1089/omi.2011.0118
Article CAS Google Scholar
Singh M, Paul S (2020) A feature weighting-assisted approach for cancer subtypes identification from paired expression profiles. IEEE/ACM Trans Comput Biol Bioinform. https://doi.org/10.1109/TCBB.2020.3041723
Article Google Scholar
Yu G, Yan GR, Wang LG, He QY (2014) DOSE: an R/Bioconductor Package for disease ontology semantic and enrichment analysis. Bioinformatics 31(4):608–609. https://doi.org/10.1093/bioinformatics/btu684
Article CAS PubMed Google Scholar
Szklarczyk D, Morris JH, Cook H, Kuhn M, Wyder S, Simonovic M, Santos A, Doncheva NT, Roth A, Bork P, Jensen LJ, von Mering C (2017) The STRING database in 2017: quality-controlled protein–protein association networks, made broadly accessible. Nucleic Acids Res 45:D362–D368. https://doi.org/10.1093/nar/gkw937
Article CAS PubMed Google Scholar
Gamboa RA, Gomez-Rueda H, Martínez-Ledesma E, Martínez-Torteya A, Chacolla-Huaringa R, Rodriguez-Barrientos A, Tamez-Pena JG, Trevino V (2013) SurvExpress: an online biomarker validation tool and database for cancer gene expression data using survival analysis. PLoS One. https://doi.org/10.1371/journal.pone.0074250
Article PubMed PubMed Central Google Scholar
Kustra R, Zagdanski A (2006) Incorporating gene ontology in clustering gene expression data. IEEE Symp Comput-Based Med Syst. https://doi.org/10.1109/CBMS.2006.100
Article Google Scholar
Hanisch D, Zien A, Zimmer R, Lengauer T (2002) Co-clustering of biological networks and gene expression data. Bioinformatics 18:S145-54. https://doi.org/10.1093/bioinformatics/18.suppl_1.s145
Article PubMed Google Scholar
Adryan B, Schuh R (2004) Gene-ontology-based clustering of gene expression data. Bioinformatics 20(16):2851–2852. https://doi.org/10.1093/bioinformatics/bth289
Article CAS PubMed Google Scholar
Ovaska K, Laakso M, Hautaniemi S (2008) Fast gene ontology based clustering for microarray experiments. Bio Data Min 1(1):1–11. https://doi.org/10.1186/1756-0381-1-11
Article CAS Google Scholar
Wang H, Azuaje F, Bodenreider O, Dopazo J (2004) Gene expression correlation and gene ontology-based similarity: an assessment of quantitative relationships. Symp Comput Intell Bioinform Comput Biol. https://doi.org/10.1109/CIBCB.2004.1393927
Article Google Scholar
Azuaje F, Bodenreider O (2004) Incorporating ontology-driven similarity knowledge into functional genomics: an exploratory study. Proc Fourth IEEE Symp Bioinform Bioeng. https://doi.org/10.1109/BIBE.2004.1317360
Article Google Scholar
Pilpel Y, Sudarsanam P, Church G (2001) Identifying regulatory networks by combinatorial analysis of promoter elements. Nat Genet 29:153–159. https://doi.org/10.1109/doi.org/10.1038/ng724
Article CAS PubMed Google Scholar

Download references

Acknowledgements

This work is partially supported by Department of Science and Technology, Government of India, New Delhi, Grant number-ECR/2016/001917.

Author information

Namrata Pant and Madhumita Madhumita: Joint first authors.

Authors and Affiliations

Department of Bioscience and Bioengineering, Indian Institute of Technology, Jodhpur, Rajasthan, 342037, India
Namrata Pant, Madhumita Madhumita & Sushmita Paul

Authors

Namrata Pant
View author publications
You can also search for this author in PubMed Google Scholar
Madhumita Madhumita
View author publications
You can also search for this author in PubMed Google Scholar
Sushmita Paul
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Sushmita Paul.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Pant, N., Madhumita, M. & Paul, S. CorGO: An Integrated Method for Clustering Functionally Similar Genes. Interdiscip Sci Comput Life Sci 13, 624–637 (2021). https://doi.org/10.1007/s12539-021-00424-9

Download citation

Received: 26 December 2020
Revised: 23 February 2021
Accepted: 05 March 2021
Published: 24 March 2021
Issue Date: December 2021
DOI: https://doi.org/10.1007/s12539-021-00424-9

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

CorGO: An Integrated Method for Clustering Functionally Similar Genes

Abstract

Access this article

Similar content being viewed by others

Computational Techniques and Tools for Omics Data Analysis: State-of-the-Art, Challenges, and Future Directions

The GeneCards Suite

Identification of a Group of Therapeutic Targets and Prognostic Biomarker for Triple Negative Breast Cancer

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Rights and permissions

About this article

Cite this article

Keywords

Navigation

CorGO: An Integrated Method for Clustering Functionally Similar Genes

Abstract

Access this article

Similar content being viewed by others

Computational Techniques and Tools for Omics Data Analysis: State-of-the-Art, Challenges, and Future Directions

The GeneCards Suite

Identification of a Group of Therapeutic Targets and Prognostic Biomarker for Triple Negative Breast Cancer

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation