An improved ant-based algorithm based on heaps merging and fuzzy c-means for clustering cancer gene expression data

Bulut, Hasan; Onan, Aytuğ; Korukoğlu, Serdar

doi:10.1007/s12046-020-01399-x

An improved ant-based algorithm based on heaps merging and fuzzy c-means for clustering cancer gene expression data

Published: 23 June 2020

Volume 45, article number 160, (2020)
Cite this article

Sādhanā Aims and scope Submit manuscript

423 Accesses
9 Citations
Explore all metrics

Abstract

The microarray technology enables the analysis of the gene expression data and the understanding of the important biological processes in an efficient way. We have developed an efficient clustering scheme for microarray gene expression data based on correlation-based feature selection, ant-based clustering, fuzzy c-means algorithm and a novel heaps merging heuristic. The algorithm utilizes the feature selection algorithm to overcome the high-dimensionality problem encountered in bioinformatics domain. Based on extensive empirical analysis on microarray data, clustering quality of the ant-based clustering algorithm is enhanced with the use of fuzzy c-means algorithm and heaps merging heuristic. The performance of the proposed clustering scheme is compared with k-means, PAM algorithm, CLARA, self-organizing map, hierarchical clustering, divisive analysis clustering, self-organizing tree algorithm, hybrid hierarchical clustering, consensus clustering, AntClass algorithm and fuzzy c-means clustering algorithms. The experimental results indicate that the proposed clustering scheme yields better performance in clustering cancer gene expression data.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A novel intelligent Fuzzy-AHP based evolutionary algorithm for detecting communities in complex networks

Article 29 February 2024

Elmira Pourabbasi, Vahid Majidnezhad, … Yasser jafari

Density-Based Clustering Based on Hierarchical Density Estimates

Data clustering: application and trends

Article 27 November 2022

Gbeminiyi John Oyewole & George Alex Thopil

References

Dalton L, Ballarin V and Brun M 2009 Clustering algorithms: on learning, validation, performance and applications to genomics. Current Genomics 10: 430–445
Google Scholar
Daxin J, Tang C and Zhang A 2004 Cluster analysis for gene expression data: a survey. IEEE Transactions on Knowledge and Data Engineering 16(11):1370–1386
Google Scholar
De Souto M C P, Costa I G, De Araujo D S A, Ludermir T B and Schliep A 2008 Clustering cancer gene expression data: a comparative study. BMC Bioinformatics 9: 497
Google Scholar
Hasan M J A and Ramakrishnan S 2011 A survey: hybrid evolutionary algorithms for cluster analysis. Artificial Intelligence Review 36(3): 179–204
Google Scholar
Alon U, Barkai N and Notterman D A 1999 Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays. Proceedings of the National Academy of Sciences of the United States of America 96: 6745–6750
Google Scholar
Golub T R, Slonim D K and Tamayo P 1999 Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. Science 286: 531–537
Google Scholar
Alizadeh A A, Eisesn M B and Davis R E 2000 Distinct types of diffuse large B-cell lymphoma identified by gene expression profiling. Nature 403: 503–511
Google Scholar
Dudoit S and Fridlyand J 2002 A prediction-based resampling method for estimating the number of clusters in a dataset. Genome Biology 3(7):1–21
Google Scholar
Datta S and Datta S 2003 Comparisons and validation of statistical clustering techniques for microarray gene expression data. Bioinformatics 19(4): 459–466
Google Scholar
Costa I G, de Carvalho F A T and de Souto M C P 2004 Comparative analysis of clustering methods for gene expression time course data. Genetics and Molecular Biology 27(4): 623–631
Google Scholar
Iam-on N and Boongoen T 2012 A new locally weighted k-means for cancer-aided microarray data analysis. Journal of Medical Systems 36: 43–49
Google Scholar
Castellanos-Garzon J A and Diaz F 2013 An evolutionary computational model applied to cluster analysis of DNA microarray data. Expert Systems with Applications 40(7): 2575–2591
Google Scholar
Binu D 2015 Cluster analysis using optimization algorithms with newly designed objective functions. Expert Syst Appl 42(14): 5848–5859
Google Scholar
Liu J and Pham T 2011 Fuzzy clustering for microarray data analysis: a review. Current Bioinformatics 6(4): 427–443
Google Scholar
Bhattacharya A, Chowdhury N and De R K 2012 Comparative analysis of clustering and biclustering algorithms for grouping of genes: co-function and co-regulation. Current Bioinformatics 7: 63–76
Google Scholar
Datta S and Mukhopadhyay S 2013 An in silico identification of human promoters: a soft computing based approach. Current Bioinformatics 8(3): 362–368
Google Scholar
Bhattacharya A and De R K 2008 Divisive correlation clustering algorithm (DCCA) for grouping of genes: detecting varying patterns in expression profiles. Bioinformatics 24(11):1359–1366.
Google Scholar
Bhattacharya A and De R K 2009 Bi-correlation clustering algorithm for determining a set of co-regulated genes. Bioinformatics 25(21):2795–2801
Google Scholar
Bhattacharya A and De R K 2010 Average correlation clustering algorithm (ACCA) for grouping of co-regulated genes with similar pattern of variation in their expression values. Journal of Biomedical Informatics 43:560–568
Google Scholar
Turner H, Bailey T and Krzanowski W 2005 Improved biclustering of microarray data demonstrated through systematic performance tests. Computational Statistics and Data Analysis 48(2):235–254.
MathSciNet MATH Google Scholar
Santamaria R, Quintales L and Theron R 2007 Methods to bicluster validation and comparison in microarray data. In: Proceedings of 8th International Conference Intelligent Data Engineering and Automated Learning 780–789
Filippone M, Masulli F and Rovetta S 2008 Stability and performances in biclustering algırithms. In: Proceedings of the International Meeting on Computational Intelligence Methods for Bioinformatics and Biostatistics 91–101
Ayadi W, Elloumi M and Hao J-K 2012 Bicfinder: a biclustering algorithm for microarray data analysis. Knowledge and Information Systems 30(2):341–358
Google Scholar
Saber H B and Elloumi M 2015 A novel biclustering algorithm of binary microarray data: BiBincons and Bibinalter. BioData Mining 38:1–14
Google Scholar
Eren K, Deveci M, Küçüktunc O and Çatalyürek Ü V 2013 A comparative analysis of biclustering algorithms for gene expression data. Brief Bioinformatics 14(3):279–292
Google Scholar
Monmarche N, Slimane N and Venturini G 1999 AntClass: discovery of clusters in in numerical data by an hybridization of an ant colony with the Kmeans algorithm. Internal Report, Universite de Tours
Monmarche N, Slimane N and Venturini G 1999 On improving clustering in numerical databases with artificial ants. Lecture Notes in Computer Science 1674: 626–635
Google Scholar
Chandrashekar G and Sahin F 2014 A survey on feature selection methods. Computers and Electrical Engineering 40: 16–28
Google Scholar
Glaab E 2011 Analysing functional genomics data using novel ensemble, consensus and data fusion techniques. Unpublished PhD Thesis, University of Nottingham, Nottingham, UK
Loennstedt I and Speed T P 2002 Replicated microarray data. Statistica Sinica 12: 31–46
MathSciNet MATH Google Scholar
Symth G K 2004 Linear models and empirical bayes methods for assessing differential expression in microarray experiments. Statistical Applications in Genetics and Molecular Biology 3(1): 1–25
MathSciNet Google Scholar
Boulesteix A and Strimmer K 2007 Partial least squares: a versatile tool for the analysis of high dimensional genomic data. Briefings in Bioinformatics 8: 32–44
Google Scholar
Breiman L 2001 Random forests. Machine Learning 45(1): 5–32
MATH Google Scholar
Tusher V, Tibshirani R and Chu G 2001 Significance analysis of microarrays applied to ioinizing radiation response. Proceedings of the National Academy of Sciences of the United States of America 98: 5116–5121
MATH Google Scholar
Hall M A 1999 Correlation-based feature selection for machine learning. Unpublished PhD Thesis, University of Waikato, Hamilton, New Zealand
Daxin J, Tang C and Zhang A 2004 Cluster analysis for gene expression data: a survey. IEEE Transactions on Knowledge and Data Engineering 16(11): 1370-1386
Google Scholar
Xu R and Wunsch D 2005 Survey of clustering algorithms. IEEE Transactions on Neural Networks 16(3): 654–678
Google Scholar
Han J and Kamber M 2006 Data mining concepts and techniques. San Francisco, Morgan Kaufmann
MATH Google Scholar
Jain A K 2010 Data clustering: 50 years beyond k-means. Pattern Recognition Letters 31: 651–656
Google Scholar
Kaufman L and Rousseeuw P J 1990 Finding groups in data: an introduction to cluster analysis. New Jersey, John Wiley & Sons
Park H S and Jun C H 2009 A simple and fast algorithm for k-medoids clustering. Expert Systems with Applications 36. 3336–3341
Google Scholar
Aggarwal C C and Reddy C K 2013 Data clustering: algorithms and applications, San Francisco, CRC
Google Scholar
Johnson R A and Wichern D W 2007 Applied multivariate statistical analysis. New Jersey, Prentice Hall
MATH Google Scholar
Herrero J, Valencia A, Dopazo J 2005 A hierarchical unsupervised growing neural network for clustering gene expression patterns. Bioinformatics 17:126–136
Google Scholar
Chipman H and Tibschirani R 2006 Hybrid hierarchical clustering with applications to microarray data. Biostatistics 7(3): 286–301
MATH Google Scholar
Onan A 2013 A study of hybrid evolutionary algorithms for cluster analysis. Unpublished Master thesis, Ege University, Izmir, Turkey
Onan A, Bulut H and Korukoğlu S 2017 An improved ant algorithm with LDA-based representation for text document clustering. Journal of Information Science 43(2): 275-292
Google Scholar
Chandra E and Anuradha VP 2011 A survey on clustering algorithms for data in spatial database management systems. International Journal of Computer Applications 24(9): 19–26
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Engineering, Ege University, Izmir, Turkey
Hasan Bulut & Serdar Korukoğlu
Department of Computer Engineering, Izmir Katip Celebi University, Izmir, Turkey
Aytuğ Onan

Authors

Hasan Bulut
View author publications
You can also search for this author in PubMed Google Scholar
Aytuğ Onan
View author publications
You can also search for this author in PubMed Google Scholar
Serdar Korukoğlu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Hasan Bulut.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Bulut, H., Onan, A. & Korukoğlu, S. An improved ant-based algorithm based on heaps merging and fuzzy c-means for clustering cancer gene expression data. Sādhanā 45, 160 (2020). https://doi.org/10.1007/s12046-020-01399-x

Download citation

Received: 12 May 2016
Revised: 05 April 2020
Accepted: 20 May 2020
Published: 23 June 2020
DOI: https://doi.org/10.1007/s12046-020-01399-x

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

An improved ant-based algorithm based on heaps merging and fuzzy c-means for clustering cancer gene expression data

Abstract

Access this article

Similar content being viewed by others

A novel intelligent Fuzzy-AHP based evolutionary algorithm for detecting communities in complex networks

Density-Based Clustering Based on Hierarchical Density Estimates

Data clustering: application and trends

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

An improved ant-based algorithm based on heaps merging and fuzzy c-means for clustering cancer gene expression data

Abstract

Access this article

Similar content being viewed by others

A novel intelligent Fuzzy-AHP based evolutionary algorithm for detecting communities in complex networks

Density-Based Clustering Based on Hierarchical Density Estimates

Data clustering: application and trends

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation