Semi-supervised Ensemble Learning for Efficient Cancer Sample Classification from miRNA Gene Expression Data

Marak, Dikme Chisil B.; Halder, Anindya; Kumar, Ansuman

doi:10.1007/s00354-021-00123-5

Semi-supervised Ensemble Learning for Efficient Cancer Sample Classification from miRNA Gene Expression Data

Published: 06 March 2021

Volume 39, pages 487–513, (2021)
Cite this article

New Generation Computing Aims and scope Submit manuscript

Dikme Chisil B. Marak¹,
Anindya Halder¹ &
Ansuman Kumar¹

397 Accesses
3 Citations
Explore all metrics

Abstract

Traditional classifiers often fail to produce desired classification accuracy because of inadequate training samples present in microRNA (miRNA) gene expression cancer datasets. In this context, we propose a novel semi-supervised ensemble learning (SSEL) strategy combining the (advantages of) semi-supervised learning and ensemble learning which is able to produce better results than the individual constituent classifiers. The proposed method is validated using eight publicly available miRNA gene expression datasets of pancreatic and colorectal cancers with respect to classification accuracy, precision, recall, macro \(F_{1}\)-measure and kappa in comparison to six other state-of-the-art methods. The experimental results reveal that the proposed SSEL method significantly dominates other compared methods for cancer sample classification. The results of the statistical significance tests, receiver operating characteristic curve and area under curve justify the relevance of the better results in favor of the proposed method.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Identification of miRNA biomarkers for breast cancer by combining ensemble regularized multinomial logistic regression and Cox regression

Article Open access 18 October 2022

Juntao Li, Hongmei Zhang & Fugen Gao

An integrated ensemble learning technique for gene expression classification and biomarker identification from RNA-seq data for pancreatic cancer prognosis

Article 16 January 2024

G. JagadeeswaraRao & A. Sivaprasad

Automatic discovery of 100-miRNA signature for cancer classification using ensemble feature selection

Article Open access 18 September 2019

Alejandro Lopez-Rincon, Marlet Martinez-Archundia, … Alberto Tonda

References

Blows, W.T.: The Biological Basis of Nursing: Cancer, 1st edn. Routledge, London (2005)
Google Scholar
ICMR-NCDIR: National Cancer Registry Programme Report 2020 by Indian Council of Medical Research (ICMR) and National Centre for Disease Informatics & Research (NCDIR), Bengaluru, India (2020)
Esquela-Kerscher, E., Slack, F.J.: Oncomirs—microRNAs with a role in cancer. Nat. Rev. cancer 6(4), 259–269 (2006)
Article Google Scholar
Alaimo, S., Giugno, R., Pulvirenti, A.: ncPred: ncRNA-disease association prediction through tripartite network-based inference. Front. Bioeng. Biotechnol. 2, 71 (2014)
Article Google Scholar
Barracchia, E.P., Pio, G., D’Elia, D., Ceci, M.: Prediction of new associations between ncRNAs and diseases exploiting multi-type hierarchical clustering. BMC Bioinform. 21(1), 1–24 (2020)
Article Google Scholar
Hwang, H.W., Mendell, J.T.: MicroRNAs in cell proliferation, cell death, and tumorigenesis. Br. J. Cancer 96(6), 776–780 (2006)
Article Google Scholar
Bartel, D.P.: MicroRNAs: genomics, biogenesis, mechanism, and function. Cell 116(2), 281–297 (2004)
Article Google Scholar
Pirooznia, M., Yang, J., Yang, M.Q., Deng, Y.: A comparative study of different machine learning methods on microarray gene expression data. BMC Genom. 9(1), 1–13 (2008)
Article Google Scholar
Tarek, S., El-Khoribi, R., Shoman, M.: Gene expression based cancer classification. Egypt. Inform. J. 18(3), 151–159 (2017)
Article Google Scholar
Guillen, P., Ebalunode, J.: Cancer classification based on microarray gene expression data using deep learning. In: Proceedings of the 2016 International Conference on Computational Science and Computational Intelligence, pp. 1403–1405. IEEE, New York (2016)
Haider, A.A., Asghar, S.: A survey of logic based classifiers. Int. J. Future Comput. Commun. 2(2), 126–129 (2013)
Article Google Scholar
Kotsiantis, S.B., Zaharakis, I., Pintelas, P.: Supervised machine learning: a review of classification techniques. Emerg. Artif. Intell. Appl. Comput. Eng. 160, 3–24 (2007)
Google Scholar
Vanitha, C.D.A., Devaraj, D., Venkatesulu, M.: Gene expression data classification using support vector machine and mutual information-based gene selection. Procedia Comput. Sci. 47, 13–21 (2015)
Article Google Scholar
Ernst, J., Beg, Q.K., Kay, K.A., Balzsi, G., Oltvai, Z.N., Bar-Joseph, Z.: Semi-supervised method for predicting transcription factor-gene interactions in Escherichia coli. PLoS Computat. Biol. 4(3), e1000044 (2008)
Article MathSciNet Google Scholar
Ibrahim, R., Yousri, N.A., Ismail, M., El-Makky, N.M.: miRNA and gene expression based cancer classification using self-learning and co-training approaches. In: Proccedings of the 2013 IEEE International Conference on Bioinformatics and Biomedicine, pp. 495–498. IEEE, China (2013)
Halder, A., Misra, S.: Semi-supervised fuzzy k-NN for cancer classification from microarray gene expression data. In: Proceedings of the 1st International Conference on Automation, Control, Energy and Systems (ACES 2014), pp. 1–5. IEEE Computer Society Press, India (2014)
Kumar, A., Halder, A.: Active learning using fuzzy-rough nearest neighbour classifier for cancer prediction from microarray gene expression data. Int. J. Pattern Recognit. Artif. Intell. 34(1), 2057001 (2020)
Article Google Scholar
Halder, A., Kumar, A.: Active learning using rough fuzzy classifier for cancer predication from microarray gene expression data. J. Biomed. Inform. 92, 103136 (2019)
Article Google Scholar
Halder, A., Dey, S., Kumar, A.: Active learning using fuzzy k-NN for cancer classification from microarray gene expression data. In: Bora, P., Prasanna, S., Sarma, K., Saikia, N. (eds.) Advances in Communication and Computing, vol. 347, no. 4, pp. 103–113. Springer, Assam, India (2015)
Chen, X., Ishwaran, H.: Random forests for genomic data analysis. Genomics 99(6), 323–329 (2012)
Article Google Scholar
Tan, A.C., Gilbert, D.: Ensemble machine learning on gene expression data for cancer classification. Appl. Bioinformatics 2(3 Suppl), S75–83 (2003)
Google Scholar
Dettling, M., Bhlmann, P.: Boosting for tumor classification with gene expression data. Bioinformatics 19(9), 1061–1069 (2003)
Article Google Scholar
Zhou, Z.H.: When semi-supervised learning meets ensemble learning. Front. Electr. Electron. Eng. China 6(1), 6–16 (2011)
Article Google Scholar
Li, C., Xie, Y., Chen, X.: Semi-supervised ensemble classification method based on near neighbor and its application. Processes 8(4), 415 (2020)
Article Google Scholar
Kim, A., Cho, S.: An ensemble semi-supervised learning method for predicting defaults in social lending. Eng. Appl. Artif. Intell. 81, 193–199 (2019)
Article Google Scholar
Stanescu, A., Caragea, D.: Ensemble-based semi-supervised learning approaches for imbalanced splice site datasets. In: Proccedings of the 2014 IEEE International Conference on Bioinformatics and Biomedicine (BIBM), pp. 432–437. IEEE, UK (2014)
Ceci, M., Pio, G., Kuzmanovski, V., Dzeroski, S.: Semi-supervised multi-view learning for gene network reconstruction. PLoS One 10(12), 1–27 (2015)
Article Google Scholar
Livieris, I.: A new ensemble self-labeled semi-supervised algorithm. Informatica 43, 221–234 (2019)
Article Google Scholar
Krasakis, A.M., Tsatsaronis, G.: Semi-supervised ensemble learning with weak supervision for biomedical relationship extraction. In: Proccedings of the Automated Knowledge Base Construction (AKBC), UK (2019)
Pio, G., Ceci, M., D’Elia, D., Malerba, D.: Learning to combine miRNA target predictions: a semi-supervised ensemble learning approach. In: Proceedings of the 22nd Italian Symposium on Advanced Database Systems (SEBD), pp. 21–28. Italy (2014)
Hoi, S.C.H., Jin, R.: Semi-supervised ensemble ranking. In: Proceedings of the 23rd National Conference on Artificial Intelligence - Volume 2, pp. 634–639. AAAI Press, Chicago, Illinois (2008)
Liu, J., Zhao, S., Wang, G.: SSEL-ADE: a semi-supervised ensemble learning framework for extracting adverse drug events from social media. Artif. Intell. Med. 84, 34–49 (2018)
Article Google Scholar
Kumar, A., Halder, A.: Ensemble-based active learning using fuzzy-rough approach for cancer sample classification. Eng. Appl. Artif. Intell. 91, 103591 (2020)
Article Google Scholar
Kamisawa, T., Wood, L.D., Itoi, T., Takaori, K.: Pancreatic cancer. Lancet 388(10039), 73–85 (2016)
Article Google Scholar
Simmonds, P.D., Best, L., George, S., Baughan, C., Buchanan, R., Davis, C., Fentiman, I., Gosney, M., Northover, J., Williams, C.: Surgery for colorectal cancer in elderly patients: a systematic review. Lancet 356(9234), 968–974 (2000)
Article Google Scholar
Mihalcea, R.: Semi-supervised self training of object detection models. In: Proceedings of the 8th Conference on Computational Natural Language Learning at HLT-NAACL, pp. 33–40. Association for Computational Linguistics, Massachusetts, USA (2004)
Schapire, R.E.: Explaining adaboost. In: Empirical Inference, pp. 37–52. Springer, Berlin, Heidelberg (2013)
Zhang, Y., Cao, G., Wang, B., Li, X.: A novel ensemble method for k-nearest neighbor. Pattern Recognit. 85, 13–25 (2019)
Article Google Scholar
Valentini, G., Muselli, M., Ruffino, F.: Cancer recognition with bagged ensembles of support vector machines. Neurocomputing 56, 461–466 (2004)
Article Google Scholar
Li, M., Zhou, Z.H.: Improve computer-aided diagnosis with machine learning techniques using undiagnosed samples. IEEE Trans. Syst. Man Cybern. Part A: Syst. Hum. 37(6), 1088–1098 (2007)
Burbidge, R., Buxton, B.: An introduction to support vector machines for data mining. Keynote Papers, Young OR12, pp. 3–15, University of Nottingham (2001)
Ceriani, L., Verme, P.: The origins of the Gini index: extracts from variabilità e mutabilità (1912) by Corrado Gini. J. Econ. Inequal. 10(3), 421–443 (2012)
Article Google Scholar
Zhu, X., Goldberg, A.B.: Introduction to semi-supervised learning. Synth. Lect. Artif. Intell. Mach. Learn. 3(1), 1–10 (2009)
MATH Google Scholar
Chapelle, O., Scholkopf, B., Zien, A.: Semi-supervised Learning (Adaptive Computation and Machine Learning), 1st edn. MIT Press, Cambridge (2010)
Google Scholar
Rosenberg, C., Hebert, M., Schneiderman, H.: Semi-supervised self training of object detection models. In: Proceedings of the 7th IEEE Workshop on Applications of Computer Vision/IEEE Workshop on Motion and Video Computing (WACV/MOTION), pp. 29–36. IEEE Computer Society Press, Breckenridge, New York (2005)
Zhang, C., Ma, Y.: Ensemble Machine Learning: Methods and Applications. Springer Science & Business Media, Berlin (2012)
Book MATH Google Scholar
Bühlmann, P.: Bagging, boosting and ensemble methods. In: Gentle, J.E., Härdle, W.K., Mori, Y. (eds.) Handbook of Computational Statistics, pp. 985–1022. Springer, Berlin, Heidelberg (2012)
Yang, P., Yang, Y., Zhou, B., Zomaya, A.: A review of ensemble methods in bioinformatics. Mach. Learn. 5(4), 296–308 (2010)
Google Scholar
Platt, J.C.: Probabilistic outputs for support vector machines and comparisons to regularized likelihood methods. Adv. Large Margin Classif. 10(3), 61–74 (1999)
Google Scholar
Chapelle, O., Vapnik, V., Bousquet, O., Mukherjee, S.: Choosing multiple parameters for support vector machines. Mach. Learn. 46(1–3), 131–159 (2002)
Article MATH Google Scholar
Clough, E., Barrett, T.: The gene expression omnibus database. Meth. Mole. Biol. 93–110 (2016)
Settouti, N., Daho, M.E.H., Lazouni, M.E.A., Chikh, M.A.: Random forest in semi-supervised learning (co-forest). In: Proccedings of the 8th International Workshop on Systems, Signal Processing and their Applications (WoSSPA) IEEE, pp. 326–329. IEEE Computer Society Press, Piscataway, NJ, Zeralda, Algeria (2013)
Ferri, C., Hernández-Orallo, J., Modroiu, R.: An experimental comparison of performance measures for classification. Pattern Recognit. Lett. 30(1), 27–38 (2009)
Article Google Scholar
Gu, Q., Zhu, L., Cai, Z.: Evaluation measures of the classification performance of imbalanced data sets. In: Proccedings of the International Symposium on Intelligence Computation and Applications. Springer, Berlin (2009)
Williamson, D.F., Parker, R.A., Kendrick, J.S.: The box plot: a simple visual method to interpret data. Ann. Intern. Med. 110(11), 916–921 (1989)
Article Google Scholar
Oyeka, I.C.A., Ebuh, G.U.: Modified Wilcoxon signed-rank test. Open J. Stat. 2(2), 172–176 (2012)
Article MathSciNet Google Scholar
Armstrong, R.A.: When to use the Bonferroni correction. Ophthalmic Physiol. Opt. 34(5), 502–508 (2014)
Article Google Scholar
Fawcett, T.: An introduction to ROC analysis. Pattern Recognit. Lett. 27(8), 861–874 (2006)
Article MathSciNet Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Application, School of Technology, North-Eastern Hill University, Tura Campus, Tura, Meghalaya, 794002, India
Dikme Chisil B. Marak, Anindya Halder & Ansuman Kumar

Authors

Dikme Chisil B. Marak
View author publications
You can also search for this author in PubMed Google Scholar
Anindya Halder
View author publications
You can also search for this author in PubMed Google Scholar
Ansuman Kumar
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Anindya Halder.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

About this article

Cite this article

Marak, D.C.B., Halder, A. & Kumar, A. Semi-supervised Ensemble Learning for Efficient Cancer Sample Classification from miRNA Gene Expression Data. New Gener. Comput. 39, 487–513 (2021). https://doi.org/10.1007/s00354-021-00123-5

Download citation

Received: 25 September 2020
Accepted: 10 February 2021
Published: 06 March 2021
Issue Date: November 2021
DOI: https://doi.org/10.1007/s00354-021-00123-5

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Semi-supervised Ensemble Learning for Efficient Cancer Sample Classification from miRNA Gene Expression Data

Abstract

Access this article

Similar content being viewed by others

Identification of miRNA biomarkers for breast cancer by combining ensemble regularized multinomial logistic regression and Cox regression

An integrated ensemble learning technique for gene expression classification and biomarker identification from RNA-seq data for pancreatic cancer prognosis

Automatic discovery of 100-miRNA signature for cancer classification using ensemble feature selection

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

About this article

Cite this article

Keywords

Navigation

Semi-supervised Ensemble Learning for Efficient Cancer Sample Classification from miRNA Gene Expression Data

Abstract

Access this article

Similar content being viewed by others

Identification of miRNA biomarkers for breast cancer by combining ensemble regularized multinomial logistic regression and Cox regression

An integrated ensemble learning technique for gene expression classification and biomarker identification from RNA-seq data for pancreatic cancer prognosis

Automatic discovery of 100-miRNA signature for cancer classification using ensemble feature selection

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

About this article

Cite this article

Share this article

Keywords

Search

Navigation