Skip to main content
Log in

Semi-supervised Ensemble Learning for Efficient Cancer Sample Classification from miRNA Gene Expression Data

  • Published:
New Generation Computing Aims and scope Submit manuscript

Abstract

Traditional classifiers often fail to produce desired classification accuracy because of inadequate training samples present in microRNA (miRNA) gene expression cancer datasets. In this context, we propose a novel semi-supervised ensemble learning (SSEL) strategy combining the (advantages of) semi-supervised learning and ensemble learning which is able to produce better results than the individual constituent classifiers. The proposed method is validated using eight publicly available miRNA gene expression datasets of pancreatic and colorectal cancers with respect to classification accuracy, precision, recall, macro \(F_{1}\)-measure and kappa in comparison to six other state-of-the-art methods. The experimental results reveal that the proposed SSEL method significantly dominates other compared methods for cancer sample classification. The results of the statistical significance tests, receiver operating characteristic curve and area under curve justify the relevance of the better results in favor of the proposed method.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3

Similar content being viewed by others

References

  1. Blows, W.T.: The Biological Basis of Nursing: Cancer, 1st edn. Routledge, London (2005)

    Google Scholar 

  2. ICMR-NCDIR: National Cancer Registry Programme Report 2020 by Indian Council of Medical Research (ICMR) and National Centre for Disease Informatics & Research (NCDIR),  Bengaluru, India (2020)

  3. Esquela-Kerscher, E., Slack, F.J.: Oncomirs—microRNAs with a role in cancer. Nat. Rev. cancer 6(4), 259–269 (2006)

    Article  Google Scholar 

  4. Alaimo, S., Giugno, R., Pulvirenti, A.: ncPred: ncRNA-disease association prediction through tripartite network-based inference. Front. Bioeng. Biotechnol. 2, 71 (2014)

    Article  Google Scholar 

  5. Barracchia, E.P., Pio, G., D’Elia, D., Ceci, M.: Prediction of new associations between ncRNAs and diseases exploiting multi-type hierarchical clustering. BMC Bioinform. 21(1), 1–24 (2020)

    Article  Google Scholar 

  6. Hwang, H.W., Mendell, J.T.: MicroRNAs in cell proliferation, cell death, and tumorigenesis. Br. J. Cancer 96(6), 776–780 (2006)

    Article  Google Scholar 

  7. Bartel, D.P.: MicroRNAs: genomics, biogenesis, mechanism, and function. Cell 116(2), 281–297 (2004)

    Article  Google Scholar 

  8. Pirooznia, M., Yang, J., Yang, M.Q., Deng, Y.: A comparative study of different machine learning methods on microarray gene expression data. BMC Genom. 9(1), 1–13 (2008)

    Article  Google Scholar 

  9. Tarek, S., El-Khoribi, R., Shoman, M.: Gene expression based cancer classification. Egypt. Inform. J. 18(3), 151–159 (2017)

    Article  Google Scholar 

  10. Guillen, P., Ebalunode, J.: Cancer classification based on microarray gene expression data using deep learning. In: Proceedings of the 2016 International Conference on Computational Science and Computational Intelligence, pp. 1403–1405. IEEE, New York (2016)

  11. Haider, A.A., Asghar, S.: A survey of logic based classifiers. Int. J. Future Comput. Commun. 2(2), 126–129 (2013)

    Article  Google Scholar 

  12. Kotsiantis, S.B., Zaharakis, I., Pintelas, P.: Supervised machine learning: a review of classification techniques. Emerg. Artif. Intell. Appl. Comput. Eng. 160, 3–24 (2007)

    Google Scholar 

  13. Vanitha, C.D.A., Devaraj, D., Venkatesulu, M.: Gene expression data classification using support vector machine and mutual information-based gene selection. Procedia Comput. Sci. 47, 13–21 (2015)

    Article  Google Scholar 

  14. Ernst, J., Beg, Q.K., Kay, K.A., Balzsi, G., Oltvai, Z.N., Bar-Joseph, Z.: Semi-supervised method for predicting transcription factor-gene interactions in Escherichia coli. PLoS Computat. Biol. 4(3), e1000044 (2008)

    Article  MathSciNet  Google Scholar 

  15. Ibrahim, R., Yousri, N.A., Ismail, M., El-Makky, N.M.: miRNA and gene expression based cancer classification using self-learning and co-training approaches. In: Proccedings of the 2013 IEEE International Conference on Bioinformatics and Biomedicine,  pp. 495–498. IEEE, China (2013)

  16. Halder, A., Misra, S.: Semi-supervised fuzzy k-NN for cancer classification from microarray gene expression data. In: Proceedings of the 1st International Conference on Automation, Control, Energy and Systems (ACES 2014), pp. 1–5. IEEE Computer Society Press, India (2014)

  17. Kumar, A., Halder, A.: Active learning using fuzzy-rough nearest neighbour classifier for cancer prediction from microarray gene expression data. Int. J. Pattern Recognit. Artif. Intell. 34(1), 2057001 (2020)

    Article  Google Scholar 

  18. Halder, A., Kumar, A.: Active learning using rough fuzzy classifier for cancer predication from microarray gene expression data. J. Biomed. Inform. 92, 103136 (2019)

    Article  Google Scholar 

  19. Halder, A., Dey, S., Kumar, A.: Active learning using fuzzy k-NN for cancer classification from microarray gene expression data. In: Bora, P., Prasanna, S., Sarma, K., Saikia, N. (eds.) Advances in Communication and Computing, vol. 347, no. 4, pp. 103–113. Springer, Assam, India (2015)

  20. Chen, X., Ishwaran, H.: Random forests for genomic data analysis. Genomics 99(6), 323–329 (2012)

    Article  Google Scholar 

  21. Tan, A.C., Gilbert, D.: Ensemble machine learning on gene expression data for cancer classification. Appl. Bioinformatics 2(3 Suppl), S75–83 (2003)

    Google Scholar 

  22. Dettling, M., Bhlmann, P.: Boosting for tumor classification with gene expression data. Bioinformatics 19(9), 1061–1069 (2003)

    Article  Google Scholar 

  23. Zhou, Z.H.: When semi-supervised learning meets ensemble learning. Front. Electr. Electron. Eng. China 6(1), 6–16 (2011)

    Article  Google Scholar 

  24. Li, C., Xie, Y., Chen, X.: Semi-supervised ensemble classification method based on near neighbor and its application. Processes 8(4), 415 (2020)

    Article  Google Scholar 

  25. Kim, A., Cho, S.: An ensemble semi-supervised learning method for predicting defaults in social lending. Eng. Appl. Artif. Intell. 81, 193–199 (2019)

    Article  Google Scholar 

  26. Stanescu, A., Caragea, D.: Ensemble-based semi-supervised learning approaches for imbalanced splice site datasets. In: Proccedings of the 2014 IEEE International Conference on Bioinformatics and Biomedicine (BIBM), pp. 432–437. IEEE, UK (2014)

  27. Ceci, M., Pio, G., Kuzmanovski, V., Dzeroski, S.: Semi-supervised multi-view learning for gene network reconstruction. PLoS One 10(12), 1–27 (2015)

    Article  Google Scholar 

  28. Livieris, I.: A new ensemble self-labeled semi-supervised algorithm. Informatica 43, 221–234 (2019)

    Article  Google Scholar 

  29. Krasakis, A.M., Tsatsaronis, G.: Semi-supervised ensemble learning with weak supervision for biomedical relationship extraction. In: Proccedings of the Automated Knowledge Base Construction (AKBC), UK (2019)

  30. Pio, G., Ceci, M., D’Elia, D., Malerba, D.: Learning to combine miRNA target predictions: a semi-supervised ensemble learning approach. In: Proceedings of the 22nd Italian Symposium on Advanced Database Systems (SEBD), pp. 21–28. Italy (2014)

  31. Hoi, S.C.H., Jin, R.: Semi-supervised ensemble ranking. In: Proceedings of the 23rd National Conference on Artificial Intelligence - Volume 2,  pp. 634–639. AAAI Press, Chicago, Illinois (2008)  

  32. Liu, J., Zhao, S., Wang, G.: SSEL-ADE: a semi-supervised ensemble learning framework for extracting adverse drug events from social media. Artif. Intell. Med. 84, 34–49 (2018)

    Article  Google Scholar 

  33. Kumar, A., Halder, A.: Ensemble-based active learning using fuzzy-rough approach for cancer sample classification. Eng. Appl. Artif. Intell. 91, 103591 (2020)

    Article  Google Scholar 

  34. Kamisawa, T., Wood, L.D., Itoi, T., Takaori, K.: Pancreatic cancer. Lancet 388(10039), 73–85 (2016)

    Article  Google Scholar 

  35. Simmonds, P.D., Best, L., George, S., Baughan, C., Buchanan, R., Davis, C., Fentiman, I., Gosney, M., Northover, J., Williams, C.: Surgery for colorectal cancer in elderly patients: a systematic review. Lancet 356(9234), 968–974 (2000)

    Article  Google Scholar 

  36. Mihalcea, R.: Semi-supervised self training of object detection models. In: Proceedings of the 8th Conference on Computational Natural Language Learning at HLT-NAACL, pp. 33–40. Association for Computational Linguistics, Massachusetts, USA (2004)  

  37. Schapire, R.E.: Explaining adaboost. In: Empirical Inference, pp. 37–52. Springer, Berlin, Heidelberg (2013)  

  38. Zhang, Y., Cao, G., Wang, B., Li, X.: A novel ensemble method for k-nearest neighbor. Pattern Recognit. 85, 13–25 (2019)

    Article  Google Scholar 

  39. Valentini, G., Muselli, M., Ruffino, F.: Cancer recognition with bagged ensembles of support vector machines. Neurocomputing 56, 461–466 (2004)

    Article  Google Scholar 

  40. Li, M., Zhou, Z.H.: Improve computer-aided diagnosis with machine learning techniques using undiagnosed samples. IEEE Trans. Syst. Man Cybern. Part A: Syst. Hum. 37(6), 1088–1098 (2007)

  41. Burbidge, R., Buxton, B.: An introduction to support vector machines for data mining. Keynote Papers, Young OR12, pp. 3–15, University of Nottingham (2001)  

  42. Ceriani, L., Verme, P.: The origins of the Gini index: extracts from variabilità e mutabilità (1912) by Corrado Gini. J. Econ. Inequal. 10(3), 421–443 (2012)

    Article  Google Scholar 

  43. Zhu, X., Goldberg, A.B.: Introduction to semi-supervised learning. Synth. Lect. Artif. Intell. Mach. Learn. 3(1), 1–10 (2009)

    MATH  Google Scholar 

  44. Chapelle, O., Scholkopf, B., Zien, A.: Semi-supervised Learning (Adaptive Computation and Machine Learning), 1st edn. MIT Press, Cambridge (2010)

    Google Scholar 

  45. Rosenberg, C., Hebert, M., Schneiderman, H.: Semi-supervised self training of object detection models. In: Proceedings of the 7th IEEE Workshop on Applications of Computer Vision/IEEE Workshop on Motion and Video Computing (WACV/MOTION), pp. 29–36. IEEE Computer Society Press, Breckenridge, New York (2005)

  46. Zhang, C., Ma, Y.: Ensemble Machine Learning: Methods and Applications. Springer Science & Business Media, Berlin (2012)

    Book  MATH  Google Scholar 

  47. Bühlmann, P.: Bagging, boosting and ensemble methods. In: Gentle, J.E., Härdle, W.K.,  Mori, Y. (eds.) Handbook of Computational Statistics, pp. 985–1022. Springer, Berlin, Heidelberg (2012)

  48. Yang, P., Yang, Y., Zhou, B., Zomaya, A.: A review of ensemble methods in bioinformatics. Mach. Learn. 5(4), 296–308 (2010)

    Google Scholar 

  49. Platt, J.C.: Probabilistic outputs for support vector machines and comparisons to regularized likelihood methods. Adv. Large Margin Classif. 10(3), 61–74 (1999)

    Google Scholar 

  50. Chapelle, O., Vapnik, V., Bousquet, O., Mukherjee, S.: Choosing multiple parameters for support vector machines. Mach. Learn. 46(1–3), 131–159 (2002)

    Article  MATH  Google Scholar 

  51. Clough, E., Barrett, T.: The gene expression omnibus database. Meth. Mole. Biol. 93–110 (2016)

  52. Settouti, N., Daho, M.E.H., Lazouni, M.E.A., Chikh, M.A.: Random forest in semi-supervised learning (co-forest). In: Proccedings of the 8th International Workshop on Systems, Signal Processing and their Applications (WoSSPA) IEEE, pp. 326–329. IEEE Computer Society Press, Piscataway, NJ,  Zeralda,  Algeria (2013)

  53. Ferri, C., Hernández-Orallo, J., Modroiu, R.: An experimental comparison of performance measures for classification. Pattern Recognit. Lett. 30(1), 27–38 (2009)

    Article  Google Scholar 

  54. Gu, Q., Zhu, L., Cai, Z.: Evaluation measures of the classification performance of imbalanced data sets. In: Proccedings of the International Symposium on Intelligence Computation and Applications. Springer, Berlin (2009)

  55. Williamson, D.F., Parker, R.A., Kendrick, J.S.: The box plot: a simple visual method to interpret data. Ann. Intern. Med. 110(11), 916–921 (1989)

    Article  Google Scholar 

  56. Oyeka, I.C.A., Ebuh, G.U.: Modified Wilcoxon signed-rank test. Open J. Stat. 2(2), 172–176 (2012)

    Article  MathSciNet  Google Scholar 

  57. Armstrong, R.A.: When to use the Bonferroni correction. Ophthalmic Physiol. Opt. 34(5), 502–508 (2014)

    Article  Google Scholar 

  58. Fawcett, T.: An introduction to ROC analysis. Pattern Recognit. Lett. 27(8), 861–874 (2006)

    Article  MathSciNet  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Anindya Halder.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Marak, D.C.B., Halder, A. & Kumar, A. Semi-supervised Ensemble Learning for Efficient Cancer Sample Classification from miRNA Gene Expression Data. New Gener. Comput. 39, 487–513 (2021). https://doi.org/10.1007/s00354-021-00123-5

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00354-021-00123-5

Keywords

Navigation