Skip to main content

Advertisement

Log in

Classification of Homo sapiens gene behavior using linear discriminant analysis fused with minimum entropy mapping

  • Original Article
  • Published:
Medical & Biological Engineering & Computing Aims and scope Submit manuscript

Abstract

Classification of Homo sapiens gene behavior employing computational biology is a recent research trend. But monitoring gene activity profile and genetic behavior from the alphabetic DNA sequence using a non-invasive method is a tremendous challenge in functional genomics. The present paper addresses such issue and attempts to differentiate Homo sapiens genes using linear discriminant analysis (LDA) method. Annotated protein coding sequences of Homo sapiens genes, collected from NCBI, are taken as test samples. Minimum entropy-based mapping (MEM) technique assists to extract highest information from the numerical DNA sequences. The proposed LDA technique has successfully classified Homo sapiens genes based on the following features: composition of hydrophilic amino acids, dominance of arginine amino acid, and magnitude and size of individual amino acids. The proposed algorithm is successfully tested on 84 Homo sapiens healthy and cancer genes of the prostate and breast cells. Classification performance of the proposed LDA technique is judged by sensitivity (89.12%), specificity (91.9%), accuracy (90.87%), F1 score (92.03%), Matthews’ correlation coefficients (81.04%), and miss rate (9.12%), and it outperforms other four existing classifiers. The results are cross-validated through Rayleigh PDF and mutual information technique. Fisher test, 2-sample T-test, and relative entropy test are considered to verify the efficacy of the present classifier.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1.
Fig. 2.
Fig. 3.
Fig. 4
Fig. 5.
Fig. 6
Fig. 7
Fig. 8
Fig. 9.

Similar content being viewed by others

References

  1. Costa FF (2012) Big data in genomics: challenges and solutions. GIT Lab J 11:1–4

    Google Scholar 

  2. National Institutes of Health government web site. [Online], Available: http://www.ncbi.nlm.nih.gov.

  3. Cancer Genome Anatomy Project. [Online], Available: http://cgap.nci.nih.gov/.

  4. GeneCards web site. [Online], Available: http://www.genecards.org.

  5. Benson DA, Cavanaugh M, Clark K, Karsch-Mizrachi I, Lipman DJ, Ostell J, Sayers EW (2012) GenBank. Nucleic Acids Res 41(D1):D36–D42

    Article  PubMed  PubMed Central  Google Scholar 

  6. World Health Organization. [Online], Available : https://www.who.int/news-room/fact-sheets/detail/cancer

  7. Yao Q, Zhenyu S, Wang B, Qin Q (2019) Identifying key genes and functionally enriched pathways in Sjögren’s syndrome by weighted gene co-expression network analysis. Front Genet 10:1142

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  8. Lu T, Chen D, Wang Y, Sun X, Li S, Miao S et al (2019) Identification of DNA methylation-driven genes in esophageal squamous cell carcinoma: a study based on The Cancer Genome Atlas. Cancer Cell Int 19(1):52

    Article  PubMed  PubMed Central  Google Scholar 

  9. Barracchia EP, Pio G, D’Elia D, Ceci M (2020) Prediction of new associations between ncRNAs and diseases exploiting multi-type hierarchical clustering. BMC Bioinforma 21(1):1–24

    Article  Google Scholar 

  10. Li, J., Li, Z., Luo, J., & Yao, Y. (2020). ACNNT3: Attention-CNN framework for prediction of sequence-based bacterial type III secreted affectors. Computational and Mathematical Methods in Medicine, 2020.

  11. Jiang X, Zhao J, Qian W, Song W, Lin GN (2020) A generative adversarial network model for disease gene prediction with RNA-seq data. IEEE Access 8:37352–37360

    Article  Google Scholar 

  12. Li Z, Zhu J, Xu X, Yao Y (2019) RDense: a protein-RNA binding prediction model based on bidirectional recurrent neural network and densely connected convolutional networks. IEEE Access 8:14588–14605

    Article  Google Scholar 

  13. Pio G, Ceci M, Prisciandaro F, Malerba D (2020) Exploiting causality in gene network reconstruction based on graph embedding. Mach Learn 109(6):1231–1279

    Article  Google Scholar 

  14. Mignone P, Pio G, D’Elia D, Ceci M (2020) Exploiting transfer learning for the reconstruction of the human gene regulatory network. Bioinforma 36(5):1553–1561

    CAS  Google Scholar 

  15. Belhumeur PN, Kriegman DJ, Hespanha JP (1997) Eigenfaces vs. fisherfaces: recognition using class specific linear projection. IEEE Trans Pattern Anal Mach Intell 19(7):711–720

    Article  Google Scholar 

  16. Etemad K, Chellappa R (1997) Discriminant analysis for recognition of human face images. JOSA A 14(8):1724–1733

    Article  Google Scholar 

  17. Toğaçar M, Ergen B, Cömert Z (2020) Application of breast cancer diagnosis based on a combination of convolutional neural networks, ridge regression and linear discriminant analysis using invasive breast cancer images processed with autoencoders. Med Hypotheses 135:109503

    Article  PubMed  Google Scholar 

  18. Sannasi Chakravarthy SR, Rajaguru H (2019) Comparison analysis of linear discriminant analysis and cuckoo-search algorithm in the classification of breast cancer from digital mammograms. Asian Pacific J Cancer Prev: APJCP 20(8):2333

    Article  Google Scholar 

  19. Fogliatto FS, Anzanello MJ, Soares F, Brust-Renck PG (2019) Decision support for breast cancer detection: classification improvement through feature selection. Cancer Control 26(1):1073274819876598

    Article  PubMed  PubMed Central  Google Scholar 

  20. Suhail Z, Denton ER, Zwiggelaar R (2018) Classification of micro-calcification in mammograms using scalable linear Fisher discriminant analysis. Med Biol Eng Comput 56(8):1475–1485

    Article  PubMed  PubMed Central  Google Scholar 

  21. Shahraki HR, Bemani P, Jalali M (2017) Classification of bladder cancer patients via penalized linear discriminant analysis. Asian Pacific J Cancer Prev: APJCP 18(5):1453

    Google Scholar 

  22. Dudoit S, Fridlyand J, Speed TP (2002) Comparison of discrimination methods for the classification of tumors using gene expression data. J Am Stat Assoc 97(457):77–87

    Article  CAS  Google Scholar 

  23. Witten DM, Tibshirani R (2011) Penalized classification using Fisher’s linear discriminant. J R Stat Soc Series B Stat Methodol. 73:753–772

    Article  PubMed  PubMed Central  Google Scholar 

  24. Duda RO, Hart PE, Stork DG (2000) Pattern classification, 2nd edn. Wiley, New York, p 153

    Google Scholar 

  25. Russo G, Zegar C, Giordano A (2003) Advantages and limitations of microarray technology in human cancer. Oncogene 22(42):6497–6507

    Article  CAS  PubMed  Google Scholar 

  26. Stranzl T, Larsen MV, Lund O, Nielsen M, Brunak S (2012) The cancer exome generated by alternative RNA splicing dilutes predicted HLA class I epitope density. PLoS One:7

  27. Singh R, Pervin S, Karimi A, Cederbaum S, Chaudhuri G (2000) Arginase activity in human breast cancer cell lines: N(omega)-hydroxy L-arginine selectively inhibits cell proliferation and induces apoptosis in MDA-MB-468 cells. Cancer Res. 60:3305–3312

    CAS  PubMed  Google Scholar 

  28. Long K, Abuelenen T, Pava L, Bastille M, Blanck G (2011) Size matters: sequential mutations in tumorigenesis may reflect the stochastic effect of mutagen target sizes. Genes Cancer 2:927–931

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  29. Parry ML, Ramsamooj M, Blanck G (2015) Big genes are big mutagen targets: a connection to cancerous, spherical cells? Cancer letters 356(2):479–482

    Article  CAS  PubMed  Google Scholar 

  30. Ghosh A, Barman S (2016) Application of Euclidean distance measurement and principal component analysis for gene identification. Gene 583(2):112–120

    Article  CAS  PubMed  Google Scholar 

  31. Nakashima H, Nishikawa K, Ooi T (1986) The folding type of a protein is relevant to the amino acid composition. J Biochem 99(1):153–162

    Article  CAS  PubMed  Google Scholar 

  32. Chou PY (1989) Prediction of protein structural classes from amino acid composition. In: Fasman GD (ed) Prediction of protein structure and the principles of protein conformation. Plenum Press, New York, pp 549–586

    Chapter  Google Scholar 

  33. Zhang CT, Chou KC (1992) An optimization approach to predicting protein structural class from amino acid composition. Protein Sci 1(3):401–408

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  34. Vogelstein B, Kinzler KW (2004) Cancer genes and the pathways they control. Nat Med 10(8):789–799

    Article  CAS  PubMed  Google Scholar 

  35. Wang L, Jin Y, Arnoldussen YJ, Jonson I, Qu S et al (2010) STAMP1 is both a proliferative and an antiapoptotic factor in prostate cancer. Cancer Res. 70:5818–5828

    Article  CAS  PubMed  Google Scholar 

  36. Cole KA, Chuaqui RF, Katz K, Pack S, Zhuang Z et al (1998) cDNA sequencing and analysis of POV1 (PB39): a novel gene up regulated in prostate cancer. Genomics 51:282–287

    Article  CAS  PubMed  Google Scholar 

  37. Kaushal A, Myers SA, Dong Y, Lai J, Tan OL, Bui LT et al (2008) A novel transcript from the KLKP1 gene is androgen regulated, down-regulated during prostate cancer progression and encodes the first non-serine protease identified from the human kallikrein gene locus. Prostate 68(4):381–399

    Article  CAS  PubMed  Google Scholar 

  38. Tan SH, Furusato B, Fang X, He F, Mohamed AA et al (2014) Evaluation of ERG responsive proteome in prostate cancer. Prostate 74:70–89

    Article  CAS  PubMed  Google Scholar 

  39. Harries LW, Perry JR, McCullagh P, Crundwell M (2010) Alterations in LMTK2, MSMB and HNF1B gene expression are associated with the development of prostate cancer. BMC Cancer 10:315

    Article  PubMed  PubMed Central  Google Scholar 

  40. Yu H, Rohan T (2000) Role of the insulin-like growth factor family in cancer development and progression. J. Natl. Cancer Inst. 92:1472–1489

    Article  CAS  PubMed  Google Scholar 

  41. Chaib H, Rubin MA, Mucci NR, Li L, Taylor JM, Day ML, Macoska JA (2001) Activated in prostate cancer: a PDZ domain-containing protein highly expressed in human primary prostate tumors. Cancer Res 61(6):2390–2394

    CAS  PubMed  Google Scholar 

  42. Bishop JL, Thaper D, Zoubeidi A (2014) The multifaceted roles of STAT3 signaling in the progression of prostate cancer. Cancers 6:829–859

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  43. Debes JD, Sebo TJ, Lohse CM, Murphy LM, De Anna LH, Tindall DJ (2003) p300 in prostate cancer proliferation and progression. Cancer Res. 63:7638–7640

    CAS  PubMed  Google Scholar 

  44. Virolle T, Krones-Herzig A, Baron V, De Gregorio G, Adamson ED, Mercola D (2003) Egr1 promotes growth and survival of prostate cancer cells identification of novel Egr1 target genes. J Biol Chem. 278:11802–11810

    Article  CAS  PubMed  Google Scholar 

  45. Kirschenbaum A, Liu XH, Yao S, Leiter A, Levine AC (2011) Prostatic acid phosphatase is expressed in human prostate cancer bone metastases and promotes osteoblast differentiation. Ann. N. Y. Acad. Sci. 1237:64–70

    Article  CAS  PubMed  Google Scholar 

  46. Lose F, Srinivasan S, O’Mara T, Marquart L, Chambers S et al (2012) Genetic association of the KLK4 locus with risk of prostate cancer. PLoS One 7:e44520

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  47. Hu XY, Xu YM, Fu Q, Yu JJ, Huang J (2009) Nedd4L expression is downregulated in prostate cancer compared to benign prostatic hyperplasia. Eur J Surg Oncol (EJSO) 35:527–531

    Article  CAS  Google Scholar 

  48. Chen Y, Yang LN, Cheng L, Tu S, Guo SJ, Le HY et al (2013) Bcl2-associated athanogene 3 interactome analysis reveals a new role in modulating proteasome activity. Mol Cell Proteomics 12(10):2804–2819

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  49. Das S, Hahn Y, Nagata S, Willingham MC, Bera TK et al (2007a) NGEP, a prostatespecific plasma membrane protein that promotes the association of LNCaP cells. Cancer Res. 67:1594–1601

    Article  CAS  PubMed  Google Scholar 

  50. Xu B, Tong N, Li JM, Zhang ZD, Wu HF (2010) ELAC2 polymorphisms and prostate cancer risk: a meta-analysis based on 18 case–control studies. Prostate Cancer Prostatic Dis 13(3):270

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  51. Sun J, Zheng SL, Wiklund F, Isaacs SD, Li G, Wiley KE, Turner AR (2009) Sequence variants at 22q13 are associated with prostate cancer risk. Cancer Res 69(1):10–15

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  52. Das S, Roth CP, Wasson LM, Vishwanatha JK (2007b) Signal transducer and activator of transcription-6 (STAT6) is a constitutively expressed survival factor in human prostate cancer. Prostate 67:1550–1564

    Article  CAS  PubMed  Google Scholar 

  53. Petrylak DP, Tangen CM, Hussain MH, Lara PN Jr, Jones JA, Taplin ME, Burch PA, Berry D, Moinpour C, Kohli M, Benson MC (2004) Docetaxel and estramustine compared with mitoxantrone and prednisone for advanced refractory prostate cancer. N Engl J Med 351(15):1513–1520

    Article  CAS  PubMed  Google Scholar 

  54. Libertini SJ, Chen H, al-Bataina, B., Koilvaram, T., George, M., Gao, A. C., & Mudryj, M. (2012) The interleukin 6 receptor is a direct transcriptional target of E2F3 in prostate tumor derived cells. Prostate 72(6):649–660

    Article  CAS  PubMed  Google Scholar 

  55. Huang SY, Huang GJ, Wu HC, Kao MC, Huang WC (2018) Ganoderma tsugae inhibits the SREBP-1/AR axis leading to suppression of cell growth and activation of apoptosis in prostate cancer cells. Mol 23(10):2539

    Article  Google Scholar 

  56. Lubik AA, Gunter JH, Hollier BG, Ettinger S, Fazli L, Stylianou N, Hendy SC, Adomat HH, Gleave ME, Pollak M, Herington A (2013) IGF2 increases de novo steroidogenesis in prostate cancer cells. Endocrine-related Cancer 20(2):173–186

    Article  CAS  PubMed  Google Scholar 

  57. Kalos M, Askaa J, Hylander BL, Repasky EA, Cai F, Vedvick T et al (2004) Prostein expression is highly restricted to normal and malignant prostate tissues. Prostate 60(3):246–256

    Article  CAS  PubMed  Google Scholar 

  58. Sun M, Ma L, Xu L, Li J, Zhang W, Petrovics G et al (2002) A human novel gene DERPC located on 16q22. 1 inhibits prostate tumor cell growth and its expression is decreased in prostate and renal tumors. Mol Med 8(10):655–663

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  59. O’Hurley G, Busch C, Fagerberg L, Hallström BM, Stadler C, Tolf A et al (2015) Analysis of the human prostate-specific proteome defined by transcriptomics and antibody-based profiling identifies TMEM79 and ACOXL as two putative, diagnostic markers in prostate cancer. PloS one 10(8):e0133449

    Article  PubMed  PubMed Central  Google Scholar 

  60. De Jong MM, Nolte IM, Te Meerman GJ, Van der Graaf WTA, Oosterwijk JC et al (2002) Genes other than BRCA1 and BRCA2 involved in breast cancer susceptibility. J Med Genet 39:225–242

    Article  PubMed  PubMed Central  Google Scholar 

  61. Vachon CM, Scott CG, Fasching PA, Hall P, Tamimi RM et al (2012) Common breast cancer susceptibility variants in LSP1 and RAD51L1 are associated with mammographic density measures that predict breast cancer risk. Cancer Epidemiol Biomark Prev 21:1156–1166

    Article  CAS  Google Scholar 

  62. Roy SS, Barman S (2018) A non-invasive cancer gene detection technique using FLANN based adaptive filter. Microsyst Technol:1–16

  63. Yamashita A, Izumi N, Kashima I, Ohnishi T, Saari B et al (2009) SMG-8 and SMG-9, two novel subunits of the SMG-1 complex, regulate remodeling of themRNA surveillance complex during nonsense-mediated mRNA decay. Genes Dev. 23:1091–1105

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  64. Gabrovska PN, Smith RA, O’Leary G, Haupt LM, Griffiths LR (2011) Investigation of the 1758GNC and 2880ANG variants within the NCOA3 gene in a breast cancer affected Australian population. Gene 482:68–72

    Article  CAS  PubMed  Google Scholar 

  65. Nakanishi T, Ross DD (2012) Breast cancer resistance protein (BCRP/ABCG2): its role in multidrug resistance and regulation of its gene expression. Chin J Cancer 31:73

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  66. Ta HQ, Thomas KS, Schrecengost RS, Bouton AH (2008) A novel association between p130Cas and resistance to the chemotherapeutic drug adriamycin in human breast cancer cells. Cancer Res. 68:8796–8804

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  67. Arnold A, Papanikolaou A (2005) Cyclin D1 in breast cancer pathogenesis. J Clin Oncol 23:4215–4224

    Article  CAS  PubMed  Google Scholar 

  68. Martin BT, Kleiber K, Kaufmann M, Strebhardt K (2006) Expression and function of the four and a half LIM-only protein 2 (FHL2) in breast cancer. J Clin Oncol 24(18):10109 (Meeting Abstracts)

    Article  Google Scholar 

  69. Lee S, Mele M, Vahl P, Christiansen PM, Jensen VE, Boedtkjer E (2014) Na+, HCO3−- cotransport is functionally upregulated during human breast carcinogenesis and required for the inverted pH gradient across the plasma membrane. Pflugers Arch - Eur J Physiol:1–11

  70. Pandey PR, Xing F, Sharma S, Watabe M, Pai SK et al (2013) Elevated lipogenesis in epithelial stem-like cell confers survival advantage in ductal carcinoma in situ of breast cancer. Oncogene 32:5111–5122

    Article  CAS  PubMed  Google Scholar 

  71. Lin W-Y et al (2014) Identification and characterization of novel associations in the CASP8/ALS2CR12 region on chromosome 2 with breast cancer risk. Hum Mol Genet ddu431

  72. Brendle A, Lei H, Brandt A, Johansson R, Enquist K, Henriksson R et al (2008) Polymorphisms in predicted microRNA-binding sites in integrin genes and breast cancer: ITGB4 as prognostic marker. Carcinog 29(7):1394–1399 Cancer Genome Anatomy Project. [Online], Available: http://cgap.nci.nih.gov/

    Article  CAS  Google Scholar 

  73. Sood AK, Wang J, Mhawech-Fauceglia P, Jana B, Liang P, Geradts J (2009) Sam-pointed domain containing Ets transcription factor in luminal breast cancer pathogenesis. Cancer Epidemiol Prev Biomark 18(6):1899–1903

    Article  CAS  Google Scholar 

  74. Kyte J, Doolittle RF (1982) A simplemethod for displaying the hydropathic character of a protein. J Mol Biol 157:105–132

    Article  CAS  PubMed  Google Scholar 

  75. McClellan DA (2012) Detecting molecular selection on single amino acid replacements. Int J Bioinforma Res Appl 8:67–80

    Article  CAS  Google Scholar 

  76. Lengauer C, Kinzler KW, Vogelstein B (1998) Genetic instabilities in human cancers. Nature 396:643–649

    Article  CAS  PubMed  Google Scholar 

  77. Galleani L, Garello R (2010) The minimum entropy mapping spectrum of a DNA sequence. Inf Theory, IEEE Trans 56(2):771–783

    Article  Google Scholar 

  78. Mika S, Ratsch G, Weston J, Scholkopf B, Mullers KR (1999) Fisher discriminant analysis with kernels. In Neural networks for signal processing IX: Proceedings of the 1999 IEEE signal processing society workshop (cat. no. 98th8468) (pp. 41-48). IEEE

  79. Ye J, Janardan R, Li Q (2005) Two-dimensional linear discriminant analysis. In Advances in neural information processing systems (pp. 1569-1576)

  80. Marx V (2013) Biology: the big challenges of big data

  81. Balakrishnama S, Ganapathiraju A (1998) Linear discriminant analysis-a brief tutorial. Inst Signal Inf Process 18:1–8

    Google Scholar 

  82. Das J, Barman S (2017) DSP based entropy estimation for identification and classification of Homo sapiens cancer genes. Microsyst Technol 23(9):4145–4154

    Article  CAS  Google Scholar 

  83. Jeong J, Gore JC, Peterson BS (2001) Mutual information analysis of the EEG in patients with Alzheimer’s disease. Clin Neurophysiol 112(5):827–835

    Article  CAS  PubMed  Google Scholar 

  84. Qiu P, Gentles AJ, Plevritis SK (2009) Fast calculation of pairwise mutual information for gene regulatory network reconstruction. Comput Methods Progr Biomed 94(2):177–180

    Article  Google Scholar 

  85. Gostev M, Faulconbridge A, Brandizi M, Fernandez-Banet J, Sarkans U, Brazma A, Parkinson H (2011) The BioSample Database (BioSD) at the European Bioinformatics Institute. Nucleic Acids Res 40:64–70

    Article  Google Scholar 

Download references

Funding

The author J. Das would like to thank University Grants Commission (UGC), India, for providing her scholarship (No. F1-17.1/2014-15/RGNF-2014-15-SC-WES-72001).

Author information

Authors and Affiliations

Authors

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Das, J., Barman (Mandal), S. Classification of Homo sapiens gene behavior using linear discriminant analysis fused with minimum entropy mapping. Med Biol Eng Comput 59, 673–691 (2021). https://doi.org/10.1007/s11517-021-02324-y

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11517-021-02324-y

Keywords

Navigation