Skip to main content
Log in

Efficient mining of discriminative co-clusters from gene expression data

  • Regular Paper
  • Published:
Knowledge and Information Systems Aims and scope Submit manuscript

Abstract

Discriminative models are used to analyze the differences between two classes and to identify class-specific patterns. Most of the existing discriminative models depend on using the entire feature space to compute the discriminative patterns for each class. Co-clustering has been proposed to capture the patterns that are correlated in a subset of features, but it cannot handle discriminative patterns in labeled datasets. In certain biological applications such as gene expression analysis, it is critical to consider the discriminative patterns that are correlated only in a subset of the feature space. The objective of this paper is twofold: first, it presents an algorithm to efficiently find arbitrarily positioned co-clusters from complex data. Second, it extends this co-clustering algorithm to discover discriminative co-clusters by incorporating the class information into the co-cluster search process. In addition, we also characterize the discriminative co-clusters and propose three novel measures that can be used to evaluate the performance of any discriminative subspace pattern-mining algorithm. We evaluated the proposed algorithms on several synthetic and real gene expression datasets, and our experimental results showed that the proposed algorithms outperformed several existing algorithms available in the literature.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14

Similar content being viewed by others

Notes

  1. To be consistent, we will be using the term ‘co-clustering’ throughout the paper. The Bioinformatics research community preferably calls it as ‘biclustering’.

References

  1. Aggarwal CC, Reddy CK (eds) (2013) Data clustering. Algorithms and applications. CRC Press

  2. Alon U, Barkai N, Notterman DA, Gish K, Ybarra S, Mack D, Levine AJ (1999) Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays. Proc Natl Acad Sci USA 96(12):6745–6750

    Article  Google Scholar 

  3. Alqadah F, Bader JS, Anand R, Reddy CK (2012) Query-based biclustering using formal concept analysis. In: SIAM international conference on data mining, pp 648–659

  4. Aris A, Anirban D, Ravi K (2008) Approximation algorithms for co-clustering. In: Proceedings of the twenty-seventh ACM SIGMOD–SIGACT–SIGART symposium on principles of database systems (PODS ‘08), NY, USA, pp 201–210

  5. Aziz MS, Reddy CK (2010) A robust seedless algorithm for correlation clustering. In: Advances in knowledge discovery and data mining. Springer, Berlin, pp 28–37

  6. Banerjee A, Dhillon I, Ghosh J, Merugu S, Modha DS (2007) A generalized maximum entropy approach to bregman co-clustering and matrix approximation. J Mach Learn Res 8:1919–1986

    Google Scholar 

  7. Ben-Dor A, Chor B, Karp R, Yakhini Z (2003) Discovering local structure in gene expression data: the order-preserving submatrix problem. J Comput Biol 10(3–4):373–384

    Google Scholar 

  8. Burdick D, Calimlim M, Gehrke J (2001) Mafia: a maximal frequent itemset algorithm for transactional databases. In: ICDE, pp 443–452

  9. Causton HC, Ren B, Koh SS, Harbison CT, Kanin E, Jennings EG, Lee TI, True HL, Lander ES, Young RA (2001) Remodeling of yeast genome expression in response to environmental changes. Mol Biol Cell 12(2):323–337

    Article  Google Scholar 

  10. Cheng Y, Church GM (2000) Biclustering of expression data. In: Proceedings of the eighth international conference on intelligent systems for molecular biology, pp 93–103

  11. Cho Hyuk, Dhillon Inderjit S (2008) Coclustering of human cancer microarrays using minimum sum-squared residue coclustering. IEEE/ACM Trans Comput Biol Bioinform 5(3):385–400

    Article  Google Scholar 

  12. Cho RJ, Campbell MJ, Winzeler EA, Steinmetz L, Conway A, Wodicka L, Wolfsberg TG, Gabrielian AE, Landsman D, Lockhart DJ, Davis RW (1998) A genome-wide transcriptional analysis of the mitotic cell cycle. Mol Cell 2(1):65–73

    Article  Google Scholar 

  13. de la Fuente Alberto (2010) From ‘differential expression’ to ‘differential networking’ identification of dysfunctional regulatory networks in diseases. Trends Genet 26(7):326–333

    Article  Google Scholar 

  14. Deodhar M, Ghosh J (2010) SCOAL: a framework for simultaneous co-clustering and learning from complex data. ACM Trans Knowl Discov Data 4:11:1–11:31

    Google Scholar 

  15. Deodhar M, Gupta G, Ghosh J, Cho H, Dhillon I (2009) A scalable framework for discovering coherent co-clusters in noisy data. In: Proceedings of the 26th annual international conference on machine learning (ICML ’09), pp 241–248

  16. Dhillon IS, Mallela S, Modha DS (2003) Information-theoretic co-clustering. In: Proceedings of the ninth ACM SIGKDD international conference on knowledge discovery and data mining (KDD ’03). ACM, New York, pp 89–98

  17. Fan H, Ramamohanarao K (2006) Fast discovery and the generalization of strong jumping emerging patterns for building compact and accurate classifiers. IEEE Trans Knowl Data Eng 18(6):721–737

    Google Scholar 

  18. Fang G, Kuang R, Pandey G, Steinbach M, Myers CL, Kumar V (2010) Subspace differential coexpression analysis: problem definition and a general approach. In: Pacific symposium on biocomputing, pp 145–156

  19. Fang G, Pandey G, Wang W, Gupta M, Steinbach M, Kumar V (2012) Mining low-support discriminative patterns from dense and high-dimensional data. IEEE Trans Knowl Data Eng 24(2):279–294

    Google Scholar 

  20. Golub TR, Slonim DK, Tamayo P, Huard C, Gaasenbeek M, Mesirov JP, Coller H, Loh ML, Downing JR, Caligiuri MA, Bloomfield CD, Lander ES (1999) Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. Science 286(5439):531–537

    Google Scholar 

  21. Hussain SF, Bisson G (2010) Text categorization using word similarities based on higher order co-occurrences. In: SDM, pp 1–12

  22. Ihmels J, Bergmann S, Barkai N (2004) Defining transcription modules using large-scale gene expression data. Bioinformatics 20(13):1993–2003

    Google Scholar 

  23. Jiang D, Tang C, Zhang A (2004) Cluster analysis for gene expression data: a survey. IEEE Trans Knowl Data Eng 16:1370–1386

    Google Scholar 

  24. Liu J, Yang J, Wang W (2004) Biclustering in gene expression data by tendency. In: Proceedings of the 2004 IEEE computational systems bioinformatics conference (CSB ‘04), Washington, DC, USA, pp 182–193

  25. Macdonald TJ, Brown KM, Lafleur B, Peterson K, Christopher L, Chen Y, Packer RJ, Philip C, Stephan DA (2001) Expression profiling of medulloblastoma: PDGFRA and the RAS/MAPK pathway as therapeutic targets for metastatic disease. Nat Genet 29(2):143–152

    Google Scholar 

  26. Madeira SC, Oliveira AL (2004) Biclustering algorithms for biological data analysis: a survey. IEEE/ACM Trans Comput Biol Bioinform 1(1):24–45

    Google Scholar 

  27. Odibat O, Reddy CK (2011) A generalized framework for mining arbitrarily positioned overlapping co-clusters. In: Proceedings of the SIAM international conference on data mining (SDM), pp 343–354

  28. Odibat O, Reddy CK, Giroux CN (2010) Differential biclustering for gene expression analysis. In: Proceedings of the ACM conference on bioinformatics and computational biology (BCB), pp 275–284

  29. Okada Y, Inoue T (2009) Identification of differentially expressed gene modules between two-class DNA microarray data. Bioinformation 4(4):134–137

    Google Scholar 

  30. Pensa RG, Boulicaut J-F (2008) Constrained co-clustering of gene expression data. In: SDM, pp 25–36

  31. Prelic A, Bleuler S, Zimmermann P, Wille A, Peter B, Wilhelm G, Lars H, Lothar T, Eckart Z (2006) A systematic comparison and evaluation of biclustering methods for gene expression data. Bioinformatics 22(9):1122–1129

    Google Scholar 

  32. Reddy CK, Chiang H-D, Rajaratnam B (2008) Trust-tech-based expectation maximization for learning finite mixture models. IEEE Trans Pattern Anal Mach Intell 30(7):1146–1157

    Google Scholar 

  33. Serin A, Vingron M (2011) Debi: discovering differentially expressed biclusters using a frequent itemset approach. Algorithm Mol Biol 6(1):18

    Google Scholar 

  34. Shan H, Banerjee A (2010) Residual bayesian co-clustering for matrix approximation. In: Proceedings of the SIAM international conference on data mining, pp 223–234

  35. Shi X, Fan W, Yu PS (2010) Efficient semi-supervised spectral co-clustering with constraints. In: IEEE international conference on data mining, pp 1043–1048

  36. Song Y, Pan S, Liu S, Wei F, Zhou MX, Qian W (2010) Constrained coclustering for textual documents. In: AAAI

  37. Steinbach M, Karypis G, Kumar V (2000) A comparison of document clustering techniques. In: Grobelnik M, Mladenic D, Milic-Frayling N (eds) Workshop on text mining (KDD-2000), August 20, pp 109–111

  38. Whitfield ML, Finlay DR, Murray JI, Troyanskaya OG, Chi J-T, Pergamenschikov A, McCalmont TH, Brown PO, Botstein D, Connolly MK (2003) Systemic and cell type-specific gene expression patterns in scleroderma skin. Proc Natl Acad Sci 100(21):12319–12324

    Google Scholar 

  39. Xu X, Lu Y, Tung AKH, Wang W (2006) Mining shifting-and-scaling co-regulation patterns on gene expression profiles. In: Proceedings of the 22nd international conference on data engineering (ICDE ’06), p 89

  40. Zhang L, Chen C, Bu J, Zhengguang C, Deng C, Jiawei H (2012) Locally discriminative coclustering. IEEE Trans Knowl Data Eng 24(6):1025–1035

    Google Scholar 

Download references

Acknowledgments

This work was supported in part by the National Cancer Institute of the National Institutes of Health under Award Number R21CA175974 and the US National Science Foundation grants IIS-1231742 and IIS-1242304. The content is solely the responsibility of the authors and does not necessarily represent the official views of the NIH and NSF.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Chandan K. Reddy.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Odibat, O., Reddy, C.K. Efficient mining of discriminative co-clusters from gene expression data. Knowl Inf Syst 41, 667–696 (2014). https://doi.org/10.1007/s10115-013-0684-0

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10115-013-0684-0

Keywords

Navigation