Elsevier

Measurement

Volume 175, April 2021, 109145
Measurement

Effective multiple cancer disease diagnosis frameworks for improved healthcare using machine learning

https://doi.org/10.1016/j.measurement.2021.109145Get rights and content

Highlights

  • Proposed Machine learning-based feature modeling improve predictive performance.

  • Study attained 99.62, 96.88 and 98.21% accuracy on breast, cervical & lung cancer.

  • Screening procedures are suggested to find presence of different stages condition.

  • System acts as miscellaneous tool for capturing pattern from clinical trials.

Abstract

Cancer is a kind of non-communicable disease, progresses with uncontrolled cell growth in the body. The cancerous cell forms a tumor that impairs the immune system, causes other biological changes to malfunction. The most common kinds of cancer are breast, prostate, leukemia, lung, and colon cancer. The presence of the disease is identified with the proper diagnosis. Many screening procedures are suggested to find the presence of the condition under different stages. Medical practitioners further analyze these electronic health records to diagnose and treat the individual. In some cases, misdiagnosis can happen due to manual error or misinterpretation of the data. To avoid these issues, this paper presents an effective computer-aided diagnosis system supported by intelligence learning models. A machine learning-based feature modeling is proposed to improve predictive performance. From the University of California, Irvine repository, breast, cervical, and lung cancer datasets are accessed to conduct this experimental study. Supervised learning algorithms are employed to train and validate the optimal features reduced by the proposed system. Using the 10-Fold cross-validation method, the trained and performance model is evaluated with validation metrics such as accuracy, f-score, precision, and recall. The study's outcome attained 99.62%, 96.88%, and 98.21% accuracy on breast, cervical, and lung cancer datasets, respectively, which exhibits the proposed system's efficacy. Moreover, this system acts as a miscellaneous tool for capturing the pattern from many clinical trials for multiple types of cancer disease.

Introduction

Disease prediction systems are highly critical in its functionality as it involves finding the presence or absence of a medical condition in an individual. It relatively involves different factors, varying characteristics, multifaceted and real-world aspects [1], [2]. In recent times, there is an increasing demand for data-driven, accurate predictive models to enhance the precise identification of future events [3]. Several medical associations and patient counseling programs provide cancer screening recommendations and guidance. Consult a doctor on the different recommendations, and together you can see what is right for you depending on your cancer risk factors. Laboratory tests, such as urine and blood tests, will help the doctor detect cancer-induced anomalies. For example, predictive models with leukemia may show the unusual number or type of white blood cell in a popular blood test called the total blood count. The doctor gathers a sample of the cells in the laboratory for examination during a biopsy. A model is obtained through any means. Dependent on the form of cancer and its location, the biopsy technique is right for you. A biopsy is the way to detect cancer certainly, in most cases.

It concerns developing systems to facilitate the end-users of the application having a more interactive and user-friendly environment. In the view of medical procedures, the physician or medical expert analyses the clinical records of the individuals to diagnose the condition with their experience, otherwise domain knowledge [4], [5], [6]. Across the globe, many healthcare providers are adopting the computer-assisted diagnosis system to facilitate medical practitioners for an accurate diagnosis [7], [8]. Applications in the medical field need special attention to developing decision support systems. Clinical data contains hidden information, usually beyond human competencies and understandability [9]. Finding the pattern is difficult and raised more demand for developing new computational methodologies. In this current scenario, the data extracted from a real-time environment is highly prone to noise and erroneous information [10], [11]. The existing mechanisms are not perfectly fitted to the requirement of the current challenges. Therefore, an effective solution is indeed important to address the need to make better diagnostic systems. This paper examined new techniques to fill the gap and limitations of the existing methods. In general, the outcome of a predictive model strongly depends on the input parameters [12]. Also, most of the time, the features are more chaotic than simple factors. It is not feasible to select all the features to build the model, as it might be prone to noise, incorrect inputs. The predictive model's performance solely depends on the significant features identified for effective sample categorization [13]. A small change in the parameters affects the results on different scales.

In many cases, the data is from a real-time environment, where the chance of inconsistency is high, and the quality is often not up to the mark [14], [15], [16]. Hence, this paper aims to investigate the existing models, finding a better mechanism to improve performance. The desired objective is to find the feature subsets from all the datasets incorporated in this experiment for effective disease diagnosis. Supervised machine learning algorithms were employed to test and evaluate the system's efficacy based upon its results. The healthcare industry has long been an early adopter and has greatly benefited from technological innovations. In several health fields, computer education, including innovative medical techniques, the processing of patient data and records and chronic diseases, is currently playing a key role in computer technology. Today, machine learning helps streamline administration in hospitals, map and manage infectious conditions, and customize patient care. It may affect the productivity of hospitals and health systems and decrease care costs.

This manuscript is framed with multiple sections as follows. “Background study” section discusses various algorithms and frameworks developed as a tool for disease diagnosis from previous literature. The proposed methodology is briefed in detail in multiple sub-modules that include dataset information. The proposed feature selection method's working process follows with machine learning methods with neat sketches in the “Materials and methods” section. Next, the model validation and performance evaluation process are detailed in the “Results” section. Finally, in the “conclusion” section, the findings and their significance are portrayed with proper reports and graphical analysis. In order to find the disease in various phases, a variety of screening techniques are recommended. Medics examine the electronic medical records more carefully to identify and manage the client. In certain situations, a manual mistake or misinterpretation of the data may cause an error in diagnostics. This paper provides an effective computer-aided diagnostic method with intelligence learning models to prevent these problems. In order to boost predictive efficiency a computer dependent functional simulation is proposed. This experimental research is being performed by the University of California, Irvine repository and by evidence on breast, cervical and lung cancer. Supervised learning algorithms are used for the preparation and evaluation by the proposed method of ideal features.

Section snippets

Background STUDY

In recent times, the predictive models have shown their importance in many fields that are not limited to healthcare, weather modeling, stock forecasting, intelligence, self-trajectory targeted missiles, etc. Many applications were constructed with the support of intelligence algorithms to perform critical operations from the past data. As the healthcare field is more sensitive over other relative fields, special attention becomes inevitable. In the absence of complex algorithms for decades

Dataset description

In order to find the disease in various phases, a variety of screening techniques are recommended. Medics examine the electronic medical records more carefully to identify and manage the client. In certain situations, a manual mistake or misinterpretation of the data may cause an error in diagnostics. This paper provides an effective computer-aided diagnostic method with intelligence learning models to prevent these problems. In order to boost predictive efficiency a computer dependent

Results and discussion

This experimental work is carried out in Java Framework with the support of python machine learning libraries through bridges in the Windows platform. Breast, cervical, and lung cancer datasets were used to conduct the study. In every phase of the pipeline, the datasets are processed, starting with pre-processing, where the missing values are imputed. The cleaned data is then forwarded into the next phase to find the best features from the proposed GA-CFS algorithm. This method identified five

Conclusion

The computational methods have shown prominence in the medical field and can provide profound solutions for complex systems. These systems are more beneficial for medical practitioners to make a better decision based on the models' guidelines, which are represented as knowledge captured and gathered from intelligence algorithms. This study presents an effective algorithmic model for better classification of the clinical data labeled manually by the experts. The proposed algorithm finds the

CRediT authorship contribution statement

Ching-Hsien Hsu: Conceptualization, Methodology, Software, Writing - original draft. Xing Chen: Writing - review & editing, Validation, Visualization, Investigation. Weiwei Lin: Investigation, Methodology, Validation, Supervision. Chuntao Jiang: Investigation, Validation, Supervision. Youhong Zhang: Investigation, Methodology, Software, Validation, Supervision. Zhifeng Hao: Writing - review & editing, Validation, Visualization, Investigation. Yeh-Ching Chung: Investigation, Methodology,

Declaration of Competing Interest

The authors declared that there is no conflict of interest.

Acknowledgement

This work was partially supported by the National Natural Science Foundation of China (Grant No. 61872084; 61802062; 62072187) and Guangdong-Hong Kong-Macao Intelligent Micro-Nano Optoelectronic Technology Joint Laboratory (Project No. 2020B1212030010).

References (59)

  • J.H. Holland et al.

    Cognitive systems based on adaptive algorithms

  • T.T. Wong

    Performance evaluation of classification algorithms by k-fold and leave-one-out cross validation

    Pattern Recogn.

    (2015)
  • M. Berg et al.

    Rationalizing medical work: decision-support techniques and medical practices

    (1997)
  • B.J. Wilson et al.

    Cluster randomized trial of a multifaceted primary care decision-support intervention for inherited breast cancer risk

    Fam. Pract.

    (2006)
  • Thorwarth, M., & Arisha, A. (2012, December). A simulation-based decision support system to model complex demand driven...
  • R. Tsopra et al.

    Comparison of two kinds of interface, based on guided navigation or usability principles, for improving the adoption of computerized decision support systems: application to the prescription of antibiotics

    J. Am. Med. Inform. Assoc.

    (2014)
  • E. Turban et al.

    Integrating expert systems and decision support systems

    Mis Quarterly

    (1986)
  • M.A. Musen et al.

    Clinical decision-support systems

  • E.H. Shortliffe

    Computer programs to support clinical decision making

    JAMA

    (1987)
  • S. Abrol et al.

    Radiomic phenotyping in brain cancer to unravel hidden information in medical images

    Top. Magn. Reson. Imaging

    (2017)
  • R. Goodloe et al.

    Reducing clinical noise for body mass index measures due to unit and transcription errors in the electronic health record

    AMIA Summits on Translational Science Proceedings

    (2017)
  • V. Agarwal et al.

    Learning statistical models of phenotypes using noisy labeled training data

    J. Am. Med. Inform. Assoc.

    (2016)
  • N. Kwak et al.

    Input feature selection for classification problems

    IEEE Trans. Neural Networks

    (2002)
  • T. Botsis et al.

    Secondary use of EHR: data quality issues and informatics opportunities

    Summit on Translational Bioinformatics

    (2010)
  • R.L. Rush et al.

    Maximizing detection of data inconsistency: The development of a consistency check interpreter

    American Medical Informatics Association

    (1987)
  • C. Danilowicz et al.

    Consensus methods for solving inconsistency of replicated data in distributed systems

    Distributed and Parallel Databases

    (2003)
  • B. Muthu et al.

    IOT based wearable sensor for diseases prediction and symptom analysis in healthcare sector

    Peer-to-Peer Netw. Appl.

    (2020)
  • S. Aruna et al.

    Knowledge based analysis of various statistical tools in detecting breast cancer

    Computer Science & Information Technology

    (2011)
  • S. Kharya et al.

    Naive Bayes classifiers: A probabilistic detection model for breast cancer

    International Journal of Computer Applications

    (2014)
  • Cited by (34)

    • Lung Cancer Prediction Using DBSMOTE and SVM

      2024, Lecture Notes in Networks and Systems
    View all citing articles on Scopus
    View full text