当前位置: X-MOL 学术J. Ambient Intell. Human. Comput. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Classification of lung cancer stages with machine learning over big data healthcare framework
Journal of Ambient Intelligence and Humanized Computing ( IF 3.662 ) Pub Date : 2020-05-26 , DOI: 10.1007/s12652-020-02071-2
R. Sujitha , V. Seenivasagam

With the fast pace in collating big data healthcare framework and accurate prediction in detection of lung cancer at early stages, machine learning gives the best of both worlds. In this paper, a streamlining of machine learning algorithms together with apache spark designs an architecture for effective classification of images and stages of lung cancer to the greatest extent. We experiment on a combination of binary classification (SVM-non linear SVM with Radial Basis Function RBF) and Multi-class classification (WTA-SVM winner-takes-all with support vector machine) with threshold technique (T-BMSVM) to classify nodules into malignant or benign nodules and also their malignancy levels respectively. The dataset used for processing is sputum cell images that have been collected from microscope lab images. We have argued for handling and processing large sizes of data sets as sputum cell images in the field of classification using the map-reduce framework in MATLAB and Pyspark, which works better with Apache spark. Our approach outperforms the other methods by achieving stability even in increasing dataset size in leaps and bounds and with a minimum error rate. It achieves 86% accuracy and other metrics are AUC-0.88, misclassification rate through which it was proved that Support Vector Machine (SVM) outperforms other classifiers. These outsourced outcomes reveal that extracting properties of features extracted from the lung cancer images successfully and SVM combined with binary classification, even classification works better with Multi-class rather than SVM, therefore, may be considered as a promising tool to diagnose the stages of nodules and classify the severity of cancer. Also, Scalability and convergence analysis embed to prove the improving results of multi-class classification than SVM.



中文翻译:

通过大数据医疗保健框架的机器学习对肺癌阶段进行分类

随着大数据医疗框架的快速整理和早期肺癌检测的准确预测,机器学习提供了两全其美的优势。在本文中,简化的机器学习算法与Apache Spark一起设计了一种架构,可在最大程度上有效地对肺癌的图像和阶段进行分类。我们结合阈值技术(T-BMSVM)对二进制分类(带有径向基函数RBF的SVM-非线性SVM和径向基函数RBF)和多类别分类(WTA-SVM赢家通吃)进行结合实验,对结节进行分类分为恶性或良性结节及其恶性程度。用于处理的数据集是从显微镜实验室图像中收集的痰细胞图像。我们已经争论过使用MATLAB和Pyspark中的map-reduce框架在分类领域中处理和处理大量数据集(如痰细胞图像),这与Apache spark更好地配合使用。我们的方法即使在跳跃式增长数据集大小和最小错误率的情况下也可以实现稳定性,从而胜过其他方法。它达到86%的准确性,其他指标为AUC-0.88,分类错误率高,证明了支持向量机(SVM)优于其他分类器。这些外包结果表明,从肺癌图像中成功提取的特征的提取属性以及SVM与二元分类相结合,即使使用Multi-class而不是SVM进行分类,效果更好,因此,可以被认为是诊断结节分期和对癌症严重程度进行分类的有前途的工具。另外,可扩展性和收敛性分析嵌入以证明多类分类比SVM的改进结果。

更新日期:2020-05-26
down
wechat
bug