Abstract
With the fast pace in collating big data healthcare framework and accurate prediction in detection of lung cancer at early stages, machine learning gives the best of both worlds. In this paper, a streamlining of machine learning algorithms together with apache spark designs an architecture for effective classification of images and stages of lung cancer to the greatest extent. We experiment on a combination of binary classification (SVM-non linear SVM with Radial Basis Function RBF) and Multi-class classification (WTA-SVM winner-takes-all with support vector machine) with threshold technique (T-BMSVM) to classify nodules into malignant or benign nodules and also their malignancy levels respectively. The dataset used for processing is sputum cell images that have been collected from microscope lab images. We have argued for handling and processing large sizes of data sets as sputum cell images in the field of classification using the map-reduce framework in MATLAB and Pyspark, which works better with Apache spark. Our approach outperforms the other methods by achieving stability even in increasing dataset size in leaps and bounds and with a minimum error rate. It achieves 86% accuracy and other metrics are AUC-0.88, misclassification rate through which it was proved that Support Vector Machine (SVM) outperforms other classifiers. These outsourced outcomes reveal that extracting properties of features extracted from the lung cancer images successfully and SVM combined with binary classification, even classification works better with Multi-class rather than SVM, therefore, may be considered as a promising tool to diagnose the stages of nodules and classify the severity of cancer. Also, Scalability and convergence analysis embed to prove the improving results of multi-class classification than SVM.
Similar content being viewed by others
Change history
27 June 2022
This article has been retracted. Please see the Retraction Notice for more detail: https://doi.org/10.1007/s12652-022-04242-9
References
Al-Ahmari AMA (2002) A fuzzy analysis approach for part-machine grouping in cellular manufacturing systems. Integr Manuf Syst 13(7):489–497
Alahmari SS, Cherezov D (2018) Delta radiomics improves pulmonary nodule malignancy prediction in lung cancer screening. IEEE Access 6:77796–77806
AshfaqKhan M et al. (2018) A two-stage big data analytics framework with real world applications using spark machine learning and long short-term memory network, Article, Chair of Computer Science 5: Infm Sys, 10. RWTH Aachen University, Aachen
Cui S, Luo Y (2018) Artificial neural network with composite architectures for prediction of local control in radiotherapy. IEEE Trans Rad Plas Med Sci 3:242–249
Dartmouth-Hitchcock Medical Centre (2019) A new machine learning model can classify lung cancer slides at the pathologist level. Sch of Comp. Dub Inst of Tech
Deep Prakash K, et al. (2017) Early detection of lung cancer using the SVM classifier in biomedical image processing. IEEE IntConf on Pow, Ctrl, Signl and InstrumEngg (ICPCSI-2017).
Eberendu AC et al. (2016) Unstructured data: an overview of the data of Big Data. Int J Emerg Trends Tech Comp Sci. https://doi.org/10.14445/22312803/IJCTT
Harimoorthy K et al. (2020) Multi-disease prediction model using improved SVM-radial bias technique in healthcare monitoring system. J Ambient Intell Human Comput. https://doi.org/10.1007/s12652-019-01652-0
Hatcher WG, Yu W (2018) A survey of deep learning: platforms, applications and emerging research trends. IEEE Access 6:24411–24432
Hosny A, Parmar C (2018) Deep learning for lung cancer prognostication: A retrospective multi-cohort radiomics study. PloS One Plos Comp Biol. https://doi.org/10.1371/journal.pmed.1002711
Jian W, Chunfeng L et al. (2018) Treatment outcome prediction for cancer patients based on radiomics and belief function theory. IEEE Trans Rad Plas Med Sci 3:216–224
Kadir T, Fergus G (2018) Lung cancer prediction using machine learning and advanced imaging techniques. Transl Lung Cancer Res 7(3):304–312. https://doi.org/10.21037/tlcr.2018.05.15
Khan W et al. (2020) Stock market prediction using machine learning classifiers and social media, news. J Ambient Intell Human Comput. https://doi.org/10.1007/s12652-020-01839-w
Khin M, Khaing AS (2014) Implementation of lung cancer nodule feature extraction using digital image processing. Int J Sci Eng Technol Res 3(9):1610–1618
Kim BC et al (2016) Deep feature learning for pulmonary nodule classification in a lung CT. In: 12th Conf. on Comp and Robot Vision.
Kulkarni A, Panditrao A (2014) Classification of lung cancer stages on CT scan images using image processing. In: 2014 IEEE international conference on advanced communications, control and computing technologies. IEEE, pp 1384–1388
Lipika D et al. (2017) Predictive analytics with structured and unstructured data - a deep learning based approach. In: IEEE Intllgnt Inform Bulln.
Liu YY, Chen YM, Yen SH, Tsai CM, Perng RP (2002) Multiple primary malignancies involving lung cancer—clinical characteristics and prognosis. Lung Cancer 35(2):189–194
Liu L, Ni J, He X (2018) Upregulation of the long noncoding RNA SNHG3 promotes lung adenocarcinoma proliferation. Dis Markers 2018:5736716
Luo Y, Daniel MS (2019) Development of a fully cross-validated bayesian network approach for local control prediction in lung cancer. IEEE Trans Rad Plas Med Sci 3(2):232–241
Murillo BR (2018) Health of things algorithms for malignancy level classification of lung nodules. IEEE Access. https://doi.org/10.1109/ACCESS.2817614
MyaTun KM et al. (2014) Implementation of lung cancer nodule feature extraction using digital image processing. IJSETR 03(09):1610–1618
Salomon J, Bianca S (2018) Lung cancer detection using deep learning
Taher F et al. (2016) Rule-based classification of sputum images for early lung cancer detection. In: IEEE IntConf on Elect, Cir, and Sys (ICECS).
Wang J et al. (2014) Prediction of malignant and benign of lung tumor using a quantitative radionic method. Conf Proc IEEE Eng Med Biol Soc. https://doi.org/10.1109/EMBC.2016.7590938
Wei L et al. (2005) A study on several machine-learning methods for classification of malignant and benign clustered microcalcifications. IEEE trans on med imaging. Learning theory. Wiley, New York
Yiwen X, Ahmed H et al (2019) Deep learning predicts lung cancer treatment response from serial medical imaging. Art Clin Can Res. https://doi.org/10.1158/1078-0432.CCR-18-2495
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
This article has been retracted. Please see the retraction notice for more detail: https://doi.org/10.1007/s12652-022-04242-9
About this article
Cite this article
Sujitha, R., Seenivasagam, V. RETRACTED ARTICLE: Classification of lung cancer stages with machine learning over big data healthcare framework. J Ambient Intell Human Comput 12, 5639–5649 (2021). https://doi.org/10.1007/s12652-020-02071-2
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s12652-020-02071-2