Efficient mining of discriminative co-clusters from gene expression data

Odibat, Omar; Reddy, Chandan K.

doi:10.1007/s10115-013-0684-0

Efficient mining of discriminative co-clusters from gene expression data

Regular Paper
Published: 28 August 2013

Volume 41, pages 667–696, (2014)
Cite this article

Knowledge and Information Systems Aims and scope Submit manuscript

Omar Odibat¹ &
Chandan K. Reddy¹

322 Accesses
11 Citations
Explore all metrics

Abstract

Discriminative models are used to analyze the differences between two classes and to identify class-specific patterns. Most of the existing discriminative models depend on using the entire feature space to compute the discriminative patterns for each class. Co-clustering has been proposed to capture the patterns that are correlated in a subset of features, but it cannot handle discriminative patterns in labeled datasets. In certain biological applications such as gene expression analysis, it is critical to consider the discriminative patterns that are correlated only in a subset of the feature space. The objective of this paper is twofold: first, it presents an algorithm to efficiently find arbitrarily positioned co-clusters from complex data. Second, it extends this co-clustering algorithm to discover discriminative co-clusters by incorporating the class information into the co-cluster search process. In addition, we also characterize the discriminative co-clusters and propose three novel measures that can be used to evaluate the performance of any discriminative subspace pattern-mining algorithm. We evaluated the proposed algorithms on several synthetic and real gene expression datasets, and our experimental results showed that the proposed algorithms outperformed several existing algorithms available in the literature.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A Comparative Analysis of Clustering and Biclustering Algorithms in Gene Analysis

Clust: automatic extraction of optimal co-expressed gene clusters from gene expression data

Article Open access 25 October 2018

Basel Abu-Jamous & Steven Kelly

Generalized gene co-expression analysis via subspace clustering using low-rank representation

Article Open access 01 May 2019

Tongxin Wang, Jie Zhang & Kun Huang

Notes

To be consistent, we will be using the term ‘co-clustering’ throughout the paper. The Bioinformatics research community preferably calls it as ‘biclustering’.

References

Aggarwal CC, Reddy CK (eds) (2013) Data clustering. Algorithms and applications. CRC Press
Alon U, Barkai N, Notterman DA, Gish K, Ybarra S, Mack D, Levine AJ (1999) Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays. Proc Natl Acad Sci USA 96(12):6745–6750
Article Google Scholar
Alqadah F, Bader JS, Anand R, Reddy CK (2012) Query-based biclustering using formal concept analysis. In: SIAM international conference on data mining, pp 648–659
Aris A, Anirban D, Ravi K (2008) Approximation algorithms for co-clustering. In: Proceedings of the twenty-seventh ACM SIGMOD–SIGACT–SIGART symposium on principles of database systems (PODS ‘08), NY, USA, pp 201–210
Aziz MS, Reddy CK (2010) A robust seedless algorithm for correlation clustering. In: Advances in knowledge discovery and data mining. Springer, Berlin, pp 28–37
Banerjee A, Dhillon I, Ghosh J, Merugu S, Modha DS (2007) A generalized maximum entropy approach to bregman co-clustering and matrix approximation. J Mach Learn Res 8:1919–1986
Google Scholar
Ben-Dor A, Chor B, Karp R, Yakhini Z (2003) Discovering local structure in gene expression data: the order-preserving submatrix problem. J Comput Biol 10(3–4):373–384
Google Scholar
Burdick D, Calimlim M, Gehrke J (2001) Mafia: a maximal frequent itemset algorithm for transactional databases. In: ICDE, pp 443–452
Causton HC, Ren B, Koh SS, Harbison CT, Kanin E, Jennings EG, Lee TI, True HL, Lander ES, Young RA (2001) Remodeling of yeast genome expression in response to environmental changes. Mol Biol Cell 12(2):323–337
Article Google Scholar
Cheng Y, Church GM (2000) Biclustering of expression data. In: Proceedings of the eighth international conference on intelligent systems for molecular biology, pp 93–103
Cho Hyuk, Dhillon Inderjit S (2008) Coclustering of human cancer microarrays using minimum sum-squared residue coclustering. IEEE/ACM Trans Comput Biol Bioinform 5(3):385–400
Article Google Scholar
Cho RJ, Campbell MJ, Winzeler EA, Steinmetz L, Conway A, Wodicka L, Wolfsberg TG, Gabrielian AE, Landsman D, Lockhart DJ, Davis RW (1998) A genome-wide transcriptional analysis of the mitotic cell cycle. Mol Cell 2(1):65–73
Article Google Scholar
de la Fuente Alberto (2010) From ‘differential expression’ to ‘differential networking’ identification of dysfunctional regulatory networks in diseases. Trends Genet 26(7):326–333
Article Google Scholar
Deodhar M, Ghosh J (2010) SCOAL: a framework for simultaneous co-clustering and learning from complex data. ACM Trans Knowl Discov Data 4:11:1–11:31
Google Scholar
Deodhar M, Gupta G, Ghosh J, Cho H, Dhillon I (2009) A scalable framework for discovering coherent co-clusters in noisy data. In: Proceedings of the 26th annual international conference on machine learning (ICML ’09), pp 241–248
Dhillon IS, Mallela S, Modha DS (2003) Information-theoretic co-clustering. In: Proceedings of the ninth ACM SIGKDD international conference on knowledge discovery and data mining (KDD ’03). ACM, New York, pp 89–98
Fan H, Ramamohanarao K (2006) Fast discovery and the generalization of strong jumping emerging patterns for building compact and accurate classifiers. IEEE Trans Knowl Data Eng 18(6):721–737
Google Scholar
Fang G, Kuang R, Pandey G, Steinbach M, Myers CL, Kumar V (2010) Subspace differential coexpression analysis: problem definition and a general approach. In: Pacific symposium on biocomputing, pp 145–156
Fang G, Pandey G, Wang W, Gupta M, Steinbach M, Kumar V (2012) Mining low-support discriminative patterns from dense and high-dimensional data. IEEE Trans Knowl Data Eng 24(2):279–294
Google Scholar
Golub TR, Slonim DK, Tamayo P, Huard C, Gaasenbeek M, Mesirov JP, Coller H, Loh ML, Downing JR, Caligiuri MA, Bloomfield CD, Lander ES (1999) Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. Science 286(5439):531–537
Google Scholar
Hussain SF, Bisson G (2010) Text categorization using word similarities based on higher order co-occurrences. In: SDM, pp 1–12
Ihmels J, Bergmann S, Barkai N (2004) Defining transcription modules using large-scale gene expression data. Bioinformatics 20(13):1993–2003
Google Scholar
Jiang D, Tang C, Zhang A (2004) Cluster analysis for gene expression data: a survey. IEEE Trans Knowl Data Eng 16:1370–1386
Google Scholar
Liu J, Yang J, Wang W (2004) Biclustering in gene expression data by tendency. In: Proceedings of the 2004 IEEE computational systems bioinformatics conference (CSB ‘04), Washington, DC, USA, pp 182–193
Macdonald TJ, Brown KM, Lafleur B, Peterson K, Christopher L, Chen Y, Packer RJ, Philip C, Stephan DA (2001) Expression profiling of medulloblastoma: PDGFRA and the RAS/MAPK pathway as therapeutic targets for metastatic disease. Nat Genet 29(2):143–152
Google Scholar
Madeira SC, Oliveira AL (2004) Biclustering algorithms for biological data analysis: a survey. IEEE/ACM Trans Comput Biol Bioinform 1(1):24–45
Google Scholar
Odibat O, Reddy CK (2011) A generalized framework for mining arbitrarily positioned overlapping co-clusters. In: Proceedings of the SIAM international conference on data mining (SDM), pp 343–354
Odibat O, Reddy CK, Giroux CN (2010) Differential biclustering for gene expression analysis. In: Proceedings of the ACM conference on bioinformatics and computational biology (BCB), pp 275–284
Okada Y, Inoue T (2009) Identification of differentially expressed gene modules between two-class DNA microarray data. Bioinformation 4(4):134–137
Google Scholar
Pensa RG, Boulicaut J-F (2008) Constrained co-clustering of gene expression data. In: SDM, pp 25–36
Prelic A, Bleuler S, Zimmermann P, Wille A, Peter B, Wilhelm G, Lars H, Lothar T, Eckart Z (2006) A systematic comparison and evaluation of biclustering methods for gene expression data. Bioinformatics 22(9):1122–1129
Google Scholar
Reddy CK, Chiang H-D, Rajaratnam B (2008) Trust-tech-based expectation maximization for learning finite mixture models. IEEE Trans Pattern Anal Mach Intell 30(7):1146–1157
Google Scholar
Serin A, Vingron M (2011) Debi: discovering differentially expressed biclusters using a frequent itemset approach. Algorithm Mol Biol 6(1):18
Google Scholar
Shan H, Banerjee A (2010) Residual bayesian co-clustering for matrix approximation. In: Proceedings of the SIAM international conference on data mining, pp 223–234
Shi X, Fan W, Yu PS (2010) Efficient semi-supervised spectral co-clustering with constraints. In: IEEE international conference on data mining, pp 1043–1048
Song Y, Pan S, Liu S, Wei F, Zhou MX, Qian W (2010) Constrained coclustering for textual documents. In: AAAI
Steinbach M, Karypis G, Kumar V (2000) A comparison of document clustering techniques. In: Grobelnik M, Mladenic D, Milic-Frayling N (eds) Workshop on text mining (KDD-2000), August 20, pp 109–111
Whitfield ML, Finlay DR, Murray JI, Troyanskaya OG, Chi J-T, Pergamenschikov A, McCalmont TH, Brown PO, Botstein D, Connolly MK (2003) Systemic and cell type-specific gene expression patterns in scleroderma skin. Proc Natl Acad Sci 100(21):12319–12324
Google Scholar
Xu X, Lu Y, Tung AKH, Wang W (2006) Mining shifting-and-scaling co-regulation patterns on gene expression profiles. In: Proceedings of the 22nd international conference on data engineering (ICDE ’06), p 89
Zhang L, Chen C, Bu J, Zhengguang C, Deng C, Jiawei H (2012) Locally discriminative coclustering. IEEE Trans Knowl Data Eng 24(6):1025–1035
Google Scholar

Download references

Acknowledgments

This work was supported in part by the National Cancer Institute of the National Institutes of Health under Award Number R21CA175974 and the US National Science Foundation grants IIS-1231742 and IIS-1242304. The content is solely the responsibility of the authors and does not necessarily represent the official views of the NIH and NSF.

Author information

Authors and Affiliations

Department of Computer Science, Wayne State University, Detroit, MI, 48202, USA
Omar Odibat & Chandan K. Reddy

Authors

Omar Odibat
View author publications
You can also search for this author in PubMed Google Scholar
Chandan K. Reddy
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Chandan K. Reddy.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Odibat, O., Reddy, C.K. Efficient mining of discriminative co-clusters from gene expression data. Knowl Inf Syst 41, 667–696 (2014). https://doi.org/10.1007/s10115-013-0684-0

Download citation

Received: 30 October 2012
Revised: 25 July 2013
Accepted: 17 August 2013
Published: 28 August 2013
Issue Date: December 2014
DOI: https://doi.org/10.1007/s10115-013-0684-0

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Efficient mining of discriminative co-clusters from gene expression data

Abstract

Access this article

Similar content being viewed by others

A Comparative Analysis of Clustering and Biclustering Algorithms in Gene Analysis

Clust: automatic extraction of optimal co-expressed gene clusters from gene expression data

Generalized gene co-expression analysis via subspace clustering using low-rank representation

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Efficient mining of discriminative co-clusters from gene expression data

Abstract

Access this article

Similar content being viewed by others

A Comparative Analysis of Clustering and Biclustering Algorithms in Gene Analysis

Clust: automatic extraction of optimal co-expressed gene clusters from gene expression data

Generalized gene co-expression analysis via subspace clustering using low-rank representation

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation