Effective multiple cancer disease diagnosis frameworks for improved healthcare using machine learning

doi:10.1016/j.measurement.2021.109145

Measurement

Volume 175, April 2021, 109145

https://doi.org/10.1016/j.measurement.2021.109145 Get rights and content

Highlights

•
Proposed Machine learning-based feature modeling improve predictive performance.
•
Study attained 99.62, 96.88 and 98.21% accuracy on breast, cervical & lung cancer.
•
Screening procedures are suggested to find presence of different stages condition.
•
System acts as miscellaneous tool for capturing pattern from clinical trials.

Abstract

Cancer is a kind of non-communicable disease, progresses with uncontrolled cell growth in the body. The cancerous cell forms a tumor that impairs the immune system, causes other biological changes to malfunction. The most common kinds of cancer are breast, prostate, leukemia, lung, and colon cancer. The presence of the disease is identified with the proper diagnosis. Many screening procedures are suggested to find the presence of the condition under different stages. Medical practitioners further analyze these electronic health records to diagnose and treat the individual. In some cases, misdiagnosis can happen due to manual error or misinterpretation of the data. To avoid these issues, this paper presents an effective computer-aided diagnosis system supported by intelligence learning models. A machine learning-based feature modeling is proposed to improve predictive performance. From the University of California, Irvine repository, breast, cervical, and lung cancer datasets are accessed to conduct this experimental study. Supervised learning algorithms are employed to train and validate the optimal features reduced by the proposed system. Using the 10-Fold cross-validation method, the trained and performance model is evaluated with validation metrics such as accuracy, f-score, precision, and recall. The study's outcome attained 99.62%, 96.88%, and 98.21% accuracy on breast, cervical, and lung cancer datasets, respectively, which exhibits the proposed system's efficacy. Moreover, this system acts as a miscellaneous tool for capturing the pattern from many clinical trials for multiple types of cancer disease.

Introduction

Disease prediction systems are highly critical in its functionality as it involves finding the presence or absence of a medical condition in an individual. It relatively involves different factors, varying characteristics, multifaceted and real-world aspects [1], [2]. In recent times, there is an increasing demand for data-driven, accurate predictive models to enhance the precise identification of future events [3]. Several medical associations and patient counseling programs provide cancer screening recommendations and guidance. Consult a doctor on the different recommendations, and together you can see what is right for you depending on your cancer risk factors. Laboratory tests, such as urine and blood tests, will help the doctor detect cancer-induced anomalies. For example, predictive models with leukemia may show the unusual number or type of white blood cell in a popular blood test called the total blood count. The doctor gathers a sample of the cells in the laboratory for examination during a biopsy. A model is obtained through any means. Dependent on the form of cancer and its location, the biopsy technique is right for you. A biopsy is the way to detect cancer certainly, in most cases.

It concerns developing systems to facilitate the end-users of the application having a more interactive and user-friendly environment. In the view of medical procedures, the physician or medical expert analyses the clinical records of the individuals to diagnose the condition with their experience, otherwise domain knowledge [4], [5], [6]. Across the globe, many healthcare providers are adopting the computer-assisted diagnosis system to facilitate medical practitioners for an accurate diagnosis [7], [8]. Applications in the medical field need special attention to developing decision support systems. Clinical data contains hidden information, usually beyond human competencies and understandability [9]. Finding the pattern is difficult and raised more demand for developing new computational methodologies. In this current scenario, the data extracted from a real-time environment is highly prone to noise and erroneous information [10], [11]. The existing mechanisms are not perfectly fitted to the requirement of the current challenges. Therefore, an effective solution is indeed important to address the need to make better diagnostic systems. This paper examined new techniques to fill the gap and limitations of the existing methods. In general, the outcome of a predictive model strongly depends on the input parameters [12]. Also, most of the time, the features are more chaotic than simple factors. It is not feasible to select all the features to build the model, as it might be prone to noise, incorrect inputs. The predictive model's performance solely depends on the significant features identified for effective sample categorization [13]. A small change in the parameters affects the results on different scales.

In many cases, the data is from a real-time environment, where the chance of inconsistency is high, and the quality is often not up to the mark [14], [15], [16]. Hence, this paper aims to investigate the existing models, finding a better mechanism to improve performance. The desired objective is to find the feature subsets from all the datasets incorporated in this experiment for effective disease diagnosis. Supervised machine learning algorithms were employed to test and evaluate the system's efficacy based upon its results. The healthcare industry has long been an early adopter and has greatly benefited from technological innovations. In several health fields, computer education, including innovative medical techniques, the processing of patient data and records and chronic diseases, is currently playing a key role in computer technology. Today, machine learning helps streamline administration in hospitals, map and manage infectious conditions, and customize patient care. It may affect the productivity of hospitals and health systems and decrease care costs.

This manuscript is framed with multiple sections as follows. “Background study” section discusses various algorithms and frameworks developed as a tool for disease diagnosis from previous literature. The proposed methodology is briefed in detail in multiple sub-modules that include dataset information. The proposed feature selection method's working process follows with machine learning methods with neat sketches in the “Materials and methods” section. Next, the model validation and performance evaluation process are detailed in the “Results” section. Finally, in the “conclusion” section, the findings and their significance are portrayed with proper reports and graphical analysis. In order to find the disease in various phases, a variety of screening techniques are recommended. Medics examine the electronic medical records more carefully to identify and manage the client. In certain situations, a manual mistake or misinterpretation of the data may cause an error in diagnostics. This paper provides an effective computer-aided diagnostic method with intelligence learning models to prevent these problems. In order to boost predictive efficiency a computer dependent functional simulation is proposed. This experimental research is being performed by the University of California, Irvine repository and by evidence on breast, cervical and lung cancer. Supervised learning algorithms are used for the preparation and evaluation by the proposed method of ideal features.

Section snippets

Background STUDY

In recent times, the predictive models have shown their importance in many fields that are not limited to healthcare, weather modeling, stock forecasting, intelligence, self-trajectory targeted missiles, etc. Many applications were constructed with the support of intelligence algorithms to perform critical operations from the past data. As the healthcare field is more sensitive over other relative fields, special attention becomes inevitable. In the absence of complex algorithms for decades

Dataset description

In order to find the disease in various phases, a variety of screening techniques are recommended. Medics examine the electronic medical records more carefully to identify and manage the client. In certain situations, a manual mistake or misinterpretation of the data may cause an error in diagnostics. This paper provides an effective computer-aided diagnostic method with intelligence learning models to prevent these problems. In order to boost predictive efficiency a computer dependent

Results and discussion

This experimental work is carried out in Java Framework with the support of python machine learning libraries through bridges in the Windows platform. Breast, cervical, and lung cancer datasets were used to conduct the study. In every phase of the pipeline, the datasets are processed, starting with pre-processing, where the missing values are imputed. The cleaned data is then forwarded into the next phase to find the best features from the proposed GA-CFS algorithm. This method identified five

Conclusion

The computational methods have shown prominence in the medical field and can provide profound solutions for complex systems. These systems are more beneficial for medical practitioners to make a better decision based on the models' guidelines, which are represented as knowledge captured and gathered from intelligence algorithms. This study presents an effective algorithmic model for better classification of the clinical data labeled manually by the experts. The proposed algorithm finds the

CRediT authorship contribution statement

Ching-Hsien Hsu: Conceptualization, Methodology, Software, Writing - original draft. Xing Chen: Writing - review & editing, Validation, Visualization, Investigation. Weiwei Lin: Investigation, Methodology, Validation, Supervision. Chuntao Jiang: Investigation, Validation, Supervision. Youhong Zhang: Investigation, Methodology, Software, Validation, Supervision. Zhifeng Hao: Writing - review & editing, Validation, Visualization, Investigation. Yeh-Ching Chung: Investigation, Methodology,

Declaration of Competing Interest

The authors declared that there is no conflict of interest.

Acknowledgement

This work was partially supported by the National Natural Science Foundation of China (Grant No. 61872084; 61802062; 62072187) and Guangdong-Hong Kong-Macao Intelligent Micro-Nano Optoelectronic Technology Joint Laboratory (Project No. 2020B1212030010).

References (59)

G.I. Doukidis
Decision support system concepts in expert systems: an empirical study
Decis. Support Syst.
(1988)
K. Kira et al.
A practical approach to feature selection
(1992)
A. Bhardwaj et al.
Breast cancer diagnosis using genetically optimized neural network model
Expert Syst. Appl.
(2015)
L. Peng et al.
An immune-inspired semi-supervised algorithm for breast cancer diagnosis
Comput. Methods Programs Biomed.
(2016)
K. Polat et al.
Principles component analysis, fuzzy weighting pre-processing and artificial immune recognition system based diagnostic system for diagnosis of lung cancer
Expert Syst. Appl.
(2008)
K.J. Wang et al.
A hybrid classifier combining Borderline-SMOTE with AIRS algorithm for estimating brain metastasis from lung cancer: A case study in Taiwan
Comput. Methods Programs Biomed.
(2015)
H.L. Chen et al.
A support vector machine classifier with rough set-based feature selection for breast cancer diagnosis
Expert Syst. Appl.
(2011)
Z.Q. Hong et al.
Optimal discriminant plane for a small number of samples and design method of classifier on the plane
Pattern Recogn.
(1991)
J. Cai et al.
Feature selection in machine learning: A new perspective
Neurocomputing
(2018)
K. Sekaran et al.
Predicting drug responsiveness with deep learning from the effects on gene expression of Obsessive-Compulsive Disorder affected cases
Comput. Commun.
(2020)

J.H. Holland et al.

Cognitive systems based on adaptive algorithms

T.T. Wong

Performance evaluation of classification algorithms by k-fold and leave-one-out cross validation

Pattern Recogn.

(2015)

M. Berg et al.

Rationalizing medical work: decision-support techniques and medical practices

(1997)

B.J. Wilson et al.

Cluster randomized trial of a multifaceted primary care decision-support intervention for inherited breast cancer risk

Fam. Pract.

(2006)

Thorwarth, M., & Arisha, A. (2012, December). A simulation-based decision support system to model complex demand driven...

R. Tsopra et al.

Comparison of two kinds of interface, based on guided navigation or usability principles, for improving the adoption of computerized decision support systems: application to the prescription of antibiotics

J. Am. Med. Inform. Assoc.

(2014)

E. Turban et al.

Integrating expert systems and decision support systems

Mis Quarterly

(1986)

M.A. Musen et al.

Clinical decision-support systems

E.H. Shortliffe

Computer programs to support clinical decision making

JAMA

(1987)

S. Abrol et al.

Radiomic phenotyping in brain cancer to unravel hidden information in medical images

Top. Magn. Reson. Imaging

(2017)

R. Goodloe et al.

Reducing clinical noise for body mass index measures due to unit and transcription errors in the electronic health record

AMIA Summits on Translational Science Proceedings

(2017)

V. Agarwal et al.

Learning statistical models of phenotypes using noisy labeled training data

J. Am. Med. Inform. Assoc.

(2016)

N. Kwak et al.

Input feature selection for classification problems

IEEE Trans. Neural Networks

(2002)

T. Botsis et al.

Secondary use of EHR: data quality issues and informatics opportunities

Summit on Translational Bioinformatics

(2010)

R.L. Rush et al.

Maximizing detection of data inconsistency: The development of a consistency check interpreter

American Medical Informatics Association

(1987)

C. Danilowicz et al.

Consensus methods for solving inconsistency of replicated data in distributed systems

Distributed and Parallel Databases

(2003)

B. Muthu et al.

IOT based wearable sensor for diseases prediction and symptom analysis in healthcare sector

Peer-to-Peer Netw. Appl.

(2020)

S. Aruna et al.

Knowledge based analysis of various statistical tools in detecting breast cancer

Computer Science & Information Technology

(2011)

S. Kharya et al.

Naive Bayes classifiers: A probabilistic detection model for breast cancer

International Journal of Computer Applications

(2014)

Cited by (34)

Carbon nanomaterials-based electrochemical aptasensor for point-of-care diagnostics of cancer biomarkers
2023, Materials Today Chemistry
Nanomaterials have been extensively utilized in the fabrication of a wide range of biosensors for the diagnostics of cancerous and non-cancerous diseases. Owing to their excellent physical, chemical, optical, electrical, thermal, and mechanical capabilities, carbon nanomaterials (CNMs) have a broad range of applications in the field of biosensing, biomedicines, converters, electrocatalysis, and energy storage. CNMs, especially when appropriately functionalized, can be employed to create high-performance electrochemical sensors with a femtomolar (fM) limit of detection of analyte and can be used as a potential tool for disease diagnostics and prognostics. Carbon-based nanomaterials in conjunction with highly specialized aptamers (antibody replicas) can boost the sensitivity, accuracy, selectivity, and speed of detection for a range of analytes. In this review, we focused on cutting-edge innovations in carbon nanomaterials for the preparation of aptamer-based electrochemical point-of-care testing devices for efficient cancer diagnostics. The advantages of these aptasensors over traditional detection methodologies would pave the way for next-generation sensing technologies. In addition, the structure-related features of carbon nanomaterials and various techniques of synthesis of carbon nanocomposites are reviewed herein. In addition, potential cancer-related biomarkers and aptamers against such biomarkers are discussed. Furthermore, the prospects of the integration of IoT with biosensing devices have been highlighted. Finally, this review provides insight into cutting-edge sustainable biosensing approaches, challenges, and future scope associated with the production of carbon nanomaterials-based electrochemical aptasensors in the field of cancer diagnostics.
Significance of machine learning in healthcare: Features, pillars and applications
2022, International Journal of Intelligent Networks
Machine Learning (ML) applications are making a considerable impact on healthcare. ML is a subtype of Artificial Intelligence (AI) technology that aims to improve the speed and accuracy of physicians' work. Countries are currently dealing with an overburdened healthcare system with a shortage of skilled physicians, where AI provides a big hope. The healthcare data can be used gainfully to identify the optimal trial sample, collect more data points, assess ongoing data from trial participants, and eliminate data-based errors. ML-based techniques assist in detecting early indicators of an epidemic or pandemic. This algorithm examines satellite data, news and social media reports, and even video sources to determine whether the sickness will become out of control. Using ML for healthcare can open up a world of possibilities in this field. It frees up healthcare providers' time to focus on patient care rather than searching or entering information. This paper studies ML and its need in healthcare, and then it discusses the associated features and appropriate pillars of ML for healthcare structure. Finally, it identified and discussed the significant applications of ML for healthcare. The applications of this technology in healthcare operations can be tremendously advantageous to the organisation. ML-based tools are used to provide various treatment alternatives and individualised treatments and improve the overall efficiency of hospitals and healthcare systems while lowering the cost of care. Shortly, ML will impact both physicians and hospitals. It will be crucial in developing clinical decision support, illness detection, and personalised treatment approaches to provide the best potential outcomes.
Lung Cancer Prediction Using DBSMOTE and SVM
2024, Lecture Notes in Networks and Systems
A hybrid wrapper approach for optimal feature selection based on a novel multiobjective technique
2023, International Journal of System of Systems Engineering
Gout Staging Diagnosis Method Based on Deep Reinforcement Learning
2023, Processes
Towards Digital Twins of 3D Reconstructed Apparel Models with an End-to-End Mobile Visualization
2023, Applied Sciences (Switzerland)

View all citing articles on Scopus

View full text

Effective multiple cancer disease diagnosis frameworks for improved healthcare using machine learning

Highlights

Abstract

Introduction

Section snippets

Background STUDY

Dataset description

Results and discussion

Conclusion

CRediT authorship contribution statement

Declaration of Competing Interest

Acknowledgement

Decis. Support Syst.

Expert Syst. Appl.

Comput. Methods Programs Biomed.

Expert Syst. Appl.

Comput. Methods Programs Biomed.

Expert Syst. Appl.

Pattern Recogn.

Neurocomputing

Comput. Commun.

Pattern Recogn.

Rationalizing medical work: decision-support techniques and medical practices

Cluster randomized trial of a multifaceted primary care decision-support intervention for inherited breast cancer risk

Fam. Pract.

Comparison of two kinds of interface, based on guided navigation or usability principles, for improving the adoption of computerized decision support systems: application to the prescription of antibiotics

J. Am. Med. Inform. Assoc.

Integrating expert systems and decision support systems

Mis Quarterly

Clinical decision-support systems

Computer programs to support clinical decision making

JAMA

Radiomic phenotyping in brain cancer to unravel hidden information in medical images

Top. Magn. Reson. Imaging

Reducing clinical noise for body mass index measures due to unit and transcription errors in the electronic health record

AMIA Summits on Translational Science Proceedings

Learning statistical models of phenotypes using noisy labeled training data

J. Am. Med. Inform. Assoc.

Input feature selection for classification problems

IEEE Trans. Neural Networks

Secondary use of EHR: data quality issues and informatics opportunities

Summit on Translational Bioinformatics

Maximizing detection of data inconsistency: The development of a consistency check interpreter

American Medical Informatics Association

Consensus methods for solving inconsistency of replicated data in distributed systems

Distributed and Parallel Databases

IOT based wearable sensor for diseases prediction and symptom analysis in healthcare sector

Peer-to-Peer Netw. Appl.

Knowledge based analysis of various statistical tools in detecting breast cancer

Computer Science & Information Technology

Naive Bayes classifiers: A probabilistic detection model for breast cancer

International Journal of Computer Applications