Scalogram based prediction model for respiratory disorders using optimized convolutional neural networks

doi:10.1016/j.artmed.2020.101809

Artificial Intelligence in Medicine

Volume 103, March 2020, 101809

https://doi.org/10.1016/j.artmed.2020.101809 Get rights and content

Highlights

•
TF representation analysis using EMD for normal, crackles,wheezes and rhonchi types of respiratory sounds is proposed.
•
Classification of IMF based scalogram images using Alexnet Convolutional Neural Network(CNN) architecture is carried out.
•
Enhancement in accuracy compared to existing wavelet approach is achieved.
•
Comparison with different optimization algorithms is examined.

Abstract

Auscultation of the lung is a conventional technique used for diagnosing chronic obstructive pulmonary diseases (COPDs) and lower respiratory infections and disorders in patients. In most of the earlier works, wavelet transforms or spectrograms have been used to analyze the lung sounds. However, an accurate prediction model for respiratory disorders has not been developed so far. In this paper, a pre-trained optimized Alexnet Convolutional Neural Network (CNN) architecture is proposed for predicting respiratory disorders. The proposed approach models the segmented respiratory sound signal into Bump and Morse scalograms from several intrinsic mode functions (IMFs) using empirical mode decomposition (EMD) method. From the extracted intrinsic mode functions, the percentage energy calculated for each wavelet coefficient in the form of scalograms are computed. Subsequently, these scalograms are given as input to the pre-trained optimized CNN model for training and testing. Stochastic gradient descent with momentum (SGDM) and adaptive data momentum (ADAM) optimization algorithms were examined to check the prediction accuracy on the dataset comprising of four classes of lung sounds, normal, crackles (coarse and fine), wheezes (monophonic & polyphonic) and low-pitched wheezes (Rhonchi). On comparison to the baseline method of standard Bump and Morse wavelet transform approach which produced 79.04 % and 81.27 % validation accuracy, an improved accuracy of 83.78 % is achieved by the virtue of scalogram representation of various IMFs of EMD. Hence, the proposed approach achieves significant performance improvement in accuracy compared to the existing state-of- the-art techniques in literature.

Introduction

Lung disease is the third largest cause of death in the world. World Health Organization (WHO), report says, more than 3 million people have lost their life due to chronic obstructive pulmonary diseases (COPDs) and lower respiratory infections while death rate is 1.7 million people due to tracheal, bronchus and lung cancer [1]. The attributes of the respiratory sounds and its investigation plays a significant factor in the pulmonary disorders. Therapeutic specialists adopt several methods, such as spirometry, plethysmography, and arterial-blood gas analysis to diagnose the lung sound attributes. Despite this, most of the available techniques are not always conducive [2]. Listening to lung sounds is a significant section of the lung investigation and it is supportive in analyzing different respiratory disorders. The technique of auscultation of the lung system does not require any skin incision; it is more economical [3] and much secured method [4] and the earliest diagnostic techniques used by the specialists to analyze diverse pulmonary diseases [5].

Lung sounds are hugely non-stationary and does not occur at regular intervals (non-periodic) in description due to the disorderly outﬂow and variation in volume of air. The lung sounds are primarily classiﬁed in two groups: normal (vesicular) and abnormal (adventitious) sounds. When there are no respiratory disorders, vesicular breathing sounds are noted. Abnormal types of sounds are supplementary sounds that are observed upon vesicular sounds and they are usually markers of complications in the lungs or airways. Some of the abnormal breath sounds include low-pitched wheezes called Rhonchi, high-pitched crackle, high-pitched wheezing (due to contraction of the bronchial tubes) and harsh sound stridor (caused by reduction of the upper airway).

The American Thoracic Society [6] defines that wheeze types of sounds occur over the frequency beyond 400 Hz, whereas Rhonchi occur at a frequency at about 200 Hz. The wheeze can be classified into monophonic (single frequency) and polyphonic (indefinite frequencies). Wheezes can be either high or low pitched and some of the disorders correlated with wheezing sounds are pneumonia, asthma and bronchitis. On the other hand, Crackles (coarse and fine) are irregular abnormal sounds induced by the rapid split of collapsed small air passage and are observed in patients affected with diseases like pneumonia, fibrosis, and heart failure.

The statistics collected from the surroundings is mostly non-linear in description, and therefore traditional methods cannot be used to devise analytical models. In the past decade, intelligent systems were used to figure out this issue but resulted with high error rate [7]. With the help of deep learning algorithms, the error rates can eventually become negligible as it handles enormous amounts of unorganized data [8,9]. Several investigations in this field strive to produce better illustrations and generate prototype to study from data without labels [10]. Deep learning is a framework used for training neural networks and it is examined to be deep if the input data is passed through a sequence of nonlinear transformations using various model architectures [11].

With deep learning models, features can be naturally learned and classified by feeding the raw data instantaneously into a deep neural network. The major deep learning architectures are Unsupervised Pre-trained networks (UPNs), Convolutional Neural Networks (CNNs), recurrent and recursive neural networks and have been applied in the areas including audio and speech signal processing and natural language processing. Among these, the Convolutional Neural Network architecture is a well-known and commonly used network for classifying images. In this paper, to extract the visual details from the pixel values of lung sound images and for accurate classification and detection, a scalogram based optimized Alexnet pre-trained Convolutional Neural Network model is developed.

The remainder of this paper is organized in this fashion: related work on the classification of respiratory sounds using different approaches is introduced in Section II; in Section III, the proposed prediction model is described, the database description and the numerical results are shown in Section IV; and at the end outcomes are given in Section V.

Section snippets

Related work

In the literature, several efforts have been reported for classifying the normal and abnormal lung sounds. They are broadly categorized based on different time-frequency transforms, disparate set of features and various classification methods. Some of the efforts on classification algorithms using extracted features are as follows: In [12], lung sounds were classified using Multi-Layer Perceptron (MLP) by employing Fourier transform for feature extraction, however, only wheezes were identified.

Proposed methodology

The framework of the proposed prediction model is shown in Fig. 1. The proposed approach transforms the segmented respiratory sound into Bump and Morse scalograms and several intrinsic mode functions using the Empirical mode decomposition method. From the extracted intrinsic mode functions, the percentage energy calculated for each wavelet coefficient in the form of scalograms are input to the pre-trained optimized convolutional neural network for training and testing. Stochastic gradient

Results and discussion

The scalograms extracted from the audio files through continuous wavelet transforms and EMD technique were trained for different iterations through pre-trained Alexnet with different epochs 2,4,8,16 & 20. Two optimization methods, the Stochastic Gradient Descent with Momentum and Adaptive Moment estimation were used for training. The experimental settings for modeling the network and Alexnet architecture are listed below in the Table 1, Table 2.

The training loss and test accuracy curves versus

Conclusion

In this paper, a prediction model with Alexnet pre-trained Convolutional Neural Networks using bump and morse scalograms created from IMFs extracted by the method of EMD is proposed. EMD is chosen as the preferred domain in this work as this decomposition technique by the virtue of its adaptive nature, treats the entire signal components in an unbiased manner irrespective of the pattern of basis function. Experimental results show that scalograms derived from IMFs of EMD, when given as input to

Declaration of Competing Interest

NIL.

References (31)

Kandaswamy et al.
Neural classiﬁcation of lung sounds using wavelet coeﬃcients
Comput Biol Med
(2004)
X. Lu et al.
An integrated automated system for crackles extraction and classification
Biomed Signal Process Control
(2008)
Gorkem Serbes et al.
Pulmonary crackle detection using time–frequency and time–scale analysis
Digit Signal Process
(2013)
B.A. Reyes et al.
Assessment of time–frequency representation techniques for thoracic sounds analysis
Comput Methods Programs Biomed
(2014)
Nandini Sengupta et al.
Lung sound classification using cepstral-based statistical features
Comput Biol Med
(2016)
Dalal Bardou et al.
Lung sounds classification using convolutional networks
Artif Intell Med
(2018)
...
M.D. Abraham Bohadana et al.
Fundamentals of lung auscultation
N Engl J Med
(2014)
Volker Gross et al.
The relationship between normal lung sounds, age and gender
Am J Respir Crit Care Med
(2000)
M. Sarkar et al.
Auscultation of the respiratory system
Ann Thorac Med
(2015)

N. Meslier et al.

Wheezes

Eur Respir J

(1995)

Y. Bengio et al.

Representation learning: a review and new perspectives

IEEE Trans Pattern Anal Mach Intell

(2013)

...

Zahangir Alom Md et al.

The history began from AlexNet: a comprehensive survey on deep learning approaches

Computer Vision and Pattern Recognition

(2018)

B.D.C.N. Prasadl et al.

An approach to develop expert systems in medical diagnosis using machine learning algorithms (asthma) and a performance study

International Journal on Soft Computing (IJSC)

(2011)

Cited by (61)

Deep learning of sleep apnea-hypopnea events for accurate classification of obstructive sleep apnea and determination of clinical severity
2024, Sleep Medicine
/Objective: Automatic apnea/hypopnea events classification, crucial for clinical applications, often faces challenges, particularly in hypopnea detection. This study aimed to evaluate the efficiency of a combined approach using nasal respiration flow (RF), peripheral oxygen saturation (SpO2), and ECG signals during polysomnography (PSG) for improved sleep apnea/hypopnea detection and obstructive sleep apnea (OSA) severity screening.
An Xception network was trained using main features from RF, SpO2, and ECG signals obtained during PSG. In addition, we incorporated demographic data for enhanced performance. The detection of apnea/hypopnea events was based on RF and SpO2 feature sets, while the screening and severity categorization of OSA utilized predicted apnea/hypopnea events in conjunction with demographic data.
Using RF and SpO2 feature sets, our model achieved an accuracy of 94 % in detecting apnea/hypopnea events. For OSA screening, an exceptional accuracy of 99 % and an AUC of 0.99 were achieved. OSA severity categorization yielded an accuracy of 93 % and an AUC of 0.91, with no misclassification between normal and mild OSA versus moderate and severe OSA. However, classification errors predominantly arose in cases with hypopnea-prevalent participants.
The proposed method offers a robust automatic detection system for apnea/hypopnea events, requiring fewer sensors than traditional PSG, and demonstrates exceptional performance. Additionally, the classification algorithms for OSA screening and severity categorization exhibit significant discriminatory capacity.
A quality detection method of corn based on spectral technology and deep learning model
2024, Spectrochimica Acta - Part A: Molecular and Biomolecular Spectroscopy
Corn is an important food crop in the world. With economic development and population growth, the nutritional quality of corn is of great significance to high-quality breeding, scientific cultivation and fine management. Aiming at the problems of cumbersome steps, time-consuming and laborious, and low accuracy in the current research on corn quality detection. This paper proposes to combine near-infrared (NIR) spectroscopy technology with deep learning technology to build a corn quality detection model based on convolutional neural network (LeNet-5). The original spectral data were preprocessed by wavelet transform (WT) and multivariate scattering correction (MSC) to remove noise interference and spectral scattering information. The Competitive Adaptive Reweighted Sampling Algorithm (CARS) was applied to optimize the characteristic wavenumber and reduce redundant data. According to the optimized characteristic wave number, it was input into the constructed corn quality detection model for simulation test, and the average detection accuracy rate of the test set was 96.46%, the average precision rate was 95.42%, the average recall rate was 97.92%, the average F1score was 96.64%, and the average recognition time was 51.95 s. Compared with traditional machine learning models such as BP neural network, K Nearest Neighbor (KNN), Support Vector Machine (SVM), Generalized Linear Model (GLM), Linear Discriminant Analysis (LDA), and Naive Bayesian (NB), the deep learning LeNet-5 network model constructed in this paper has an average accuracy increase of 39.32%, and has a higher detection accuracy.
Artificial intelligence approaches to physiological parameter analysis in the monitoring and treatment of non-communicable diseases: A review
2024, Biomedical Signal Processing and Control
The wide availability of electronic medical data from multiple sources, such as clinical settings and wearable devices, evidences an important opportunity for the analysis of non-communicable diseases with high mortality rates globally and costly management expenditures, like cancer, diabetes, cardiovascular or chronic respiratory diseases. Advances in artificial intelligence (AI) have a great potential to use the information obtained from the constant monitoring of physiological parameters to create models that can serve as support in the warning of possible health complications, enabling medical professionals to provide anticipated care, thus benefiting national economies through reduction of medical expenses.
The present study is a systematic review of different scientific databases taking two time periods, from 1956 to 2021 and from 2011 to 2021, to show the trends in the use of artificial intelligence algorithms and the performances obtained in the monitoring and detection of non-communicable diseases using electronic data. It is found that the analysis of cancer data predominates over that of other non-communicable diseases; also, the best performance of AI techniques was not achieved by artificial neural networks despite being the most reported computing systems in the time period considered.
Lung anomaly detection from respiratory sound database (sound signals)
2023, Computers in Biology and Medicine
Chest or upper body auscultation has long been considered a useful part of the physical examination going back to the time of Hippocrates. However, it did not become a prevalent practice until the invention of the stethoscope by Rene Laennec in 1816, which made the practice suitable and hygienic. Pulmonary disease is a kind of sickness that affects the lungs and various parts of the respiratory system. Lung diseases are the third largest cause of death in the world. According to the World Health Organization (WHO), the five major respiratory diseases, namely chronic obstructive pulmonary disease (COPD), tuberculosis, acute lower respiratory tract infection (LRTI), asthma, and lung cancer, cause the death of more than 3 million people each year worldwide. Respiratory sounds disclose significant information regarding the lungs of patients. Numerous methods are developed for analyzing the lung sounds. However, clinical approaches require qualified pulmonologists to diagnose such kind of signals appropriately and are also time consuming. Hence, an efficient Fractional Water Cycle Swarm Optimizer-based Deep Residual Network (Fr-WCSO-based DRN) is developed in this research for detecting the pulmonary abnormalities using respiratory sounds signals. The proposed Fr-WCSO is newly designed by the incorporation of Fractional Calculus (FC) and Water Cycle Swarm Optimizer WCSO. Meanwhile, WCSO is the combination of Water Cycle Algorithm (WCA) with Competitive Swarm Optimizer (CSO). The respiratory input sound signals are pre-processed and the important features needed for the further processing are effectively extracted. With the extracted features, data augmentation is carried out for minimizing the over fitting issues for improving the overall detection performance. Once data augmentation is done, feature selection is performed using proposed Fr-WCSO algorithm. Finally, pulmonary abnormality detection is performed using DRN where the training procedure of DRN is performed using the developed Fr-WCSO algorithm. The developed method achieved superior performance by considering the evaluation measures, namely True Positive Rate (TPR), True Negative Rate (TNR) and testing accuracy with the values of 0.963(96.3%), 0.932,(93.2%) and 0.948(94.8%), respectively.
Bayesian optimized GoogLeNet based respiratory signal prediction model from empirically decomposed gammatone visualization
2023, Biomedical Signal Processing and Control
Discrimination of normal and adventitious respiratory sounds from stethoscope auscultation by human ears is challenging owing to low frequency characteristics and varying frequency range for inspiration and expiration. This makes the diagnosis of pulmonary disorders a subjective one relying on the experience and hearing capability of the physician. Most computer assisted diagnosis systems formulated to address these limitations fail to capture the inherent acoustic property of respiratory sounds close to human auditory system. To circumvent this problem, this study exploits the gammatone filter banks to evaluate the distribution of all frequency components present in the signal. To categorize the respiratory signals, GoogLeNet based Convolutional Neural Networks (CNN) prediction model is developed through Time-Frequency (TF) visualization of gammatone cepstral coefficients acquired from decomposed Intrinsic Mode Functions (IMFs) in an empirical manner. As the performance of the CNN model is greatly dependent on the learning environment, this study also tends to optimize the values of the hyper parameters to enhance the classification performance of the CNN model. Accordingly, the optimal values of initial learning rate, L2 regularization and momentum are identified for both Stochastic Gradient Descent Momentum (SGDM) and Adaptive Momentum (ADAM) optimizer using Bayesian optimization technique. Experimentation is carried out for three TF visualization methods viz. spectrograms, scalograms and Constant Q spectrograms. Evaluated results show that scalogram method of classification yields high accuracy of 87.37% and 88.32% for default selection of hyperparameters using SGDM and ADAM optimizers and with optimal selection of hyperparameters based on objective function reveal an improved accuracy of 93.68 % and 95.67% for SGDM and ADAM optimizers.
Electret-based flexible pressure sensor for respiratory diseases auxiliary diagnosis system using machine learning technique
2023, Nano Energy
Respiratory diseases have been increasingly affecting people worldwide, posing a major public health challenge due to rising morbidity and mortality rates. Subtle abnormal respiratory sounds s may present early in the pulmonary or respiratory tract diseases, therefore urging an instant intervention in critical clinical conditions. However, the current monitoring and signal analysis technologies pose significant challenges in achieving real-time, convenient, and accurate respiratory disease monitoring. Here, we propose a novel automatic auxiliary diagnosis system that utilizes a flexible electret-based self-powered sensor (FESS), signal processing technology, and machine learning algorithms. The FESS is based on a strain-enhanced laminated electret with an edge-to-edge hollow hemisphere array structure, which enables the system to detect and prevent various respiratory diseases with high accuracy rates of 99.43%, sensitivity of 98.30%, and specificity of 99.02%. Our system holds immense potential in reducing medical burden and improving the overall health of individuals.

View all citing articles on Scopus

View full text

Position PaperScalogram based prediction model for respiratory disorders using optimized convolutional neural networks

Highlights

Abstract

Introduction

Section snippets

Related work

Proposed methodology

Results and discussion

Conclusion

Declaration of Competing Interest

Comput Biol Med

Biomed Signal Process Control

Digit Signal Process

Comput Methods Programs Biomed

Comput Biol Med

Artif Intell Med

Fundamentals of lung auscultation

N Engl J Med

The relationship between normal lung sounds, age and gender

Am J Respir Crit Care Med

Auscultation of the respiratory system

Ann Thorac Med