Position Paper
Scalogram based prediction model for respiratory disorders using optimized convolutional neural networks

https://doi.org/10.1016/j.artmed.2020.101809Get rights and content

Highlights

  • TF representation analysis using EMD for normal, crackles,wheezes and rhonchi types of respiratory sounds is proposed.

  • Classification of IMF based scalogram images using Alexnet Convolutional Neural Network(CNN) architecture is carried out.

  • Enhancement in accuracy compared to existing wavelet approach is achieved.

  • Comparison with different optimization algorithms is examined.

Abstract

Auscultation of the lung is a conventional technique used for diagnosing chronic obstructive pulmonary diseases (COPDs) and lower respiratory infections and disorders in patients. In most of the earlier works, wavelet transforms or spectrograms have been used to analyze the lung sounds. However, an accurate prediction model for respiratory disorders has not been developed so far. In this paper, a pre-trained optimized Alexnet Convolutional Neural Network (CNN) architecture is proposed for predicting respiratory disorders. The proposed approach models the segmented respiratory sound signal into Bump and Morse scalograms from several intrinsic mode functions (IMFs) using empirical mode decomposition (EMD) method. From the extracted intrinsic mode functions, the percentage energy calculated for each wavelet coefficient in the form of scalograms are computed. Subsequently, these scalograms are given as input to the pre-trained optimized CNN model for training and testing. Stochastic gradient descent with momentum (SGDM) and adaptive data momentum (ADAM) optimization algorithms were examined to check the prediction accuracy on the dataset comprising of four classes of lung sounds, normal, crackles (coarse and fine), wheezes (monophonic & polyphonic) and low-pitched wheezes (Rhonchi). On comparison to the baseline method of standard Bump and Morse wavelet transform approach which produced 79.04 % and 81.27 % validation accuracy, an improved accuracy of 83.78 % is achieved by the virtue of scalogram representation of various IMFs of EMD. Hence, the proposed approach achieves significant performance improvement in accuracy compared to the existing state-of- the-art techniques in literature.

Introduction

Lung disease is the third largest cause of death in the world. World Health Organization (WHO), report says, more than 3 million people have lost their life due to chronic obstructive pulmonary diseases (COPDs) and lower respiratory infections while death rate is 1.7 million people due to tracheal, bronchus and lung cancer [1]. The attributes of the respiratory sounds and its investigation plays a significant factor in the pulmonary disorders. Therapeutic specialists adopt several methods, such as spirometry, plethysmography, and arterial-blood gas analysis to diagnose the lung sound attributes. Despite this, most of the available techniques are not always conducive [2]. Listening to lung sounds is a significant section of the lung investigation and it is supportive in analyzing different respiratory disorders. The technique of auscultation of the lung system does not require any skin incision; it is more economical [3] and much secured method [4] and the earliest diagnostic techniques used by the specialists to analyze diverse pulmonary diseases [5].

Lung sounds are hugely non-stationary and does not occur at regular intervals (non-periodic) in description due to the disorderly outflow and variation in volume of air. The lung sounds are primarily classified in two groups: normal (vesicular) and abnormal (adventitious) sounds. When there are no respiratory disorders, vesicular breathing sounds are noted. Abnormal types of sounds are supplementary sounds that are observed upon vesicular sounds and they are usually markers of complications in the lungs or airways. Some of the abnormal breath sounds include low-pitched wheezes called Rhonchi, high-pitched crackle, high-pitched wheezing (due to contraction of the bronchial tubes) and harsh sound stridor (caused by reduction of the upper airway).

The American Thoracic Society [6] defines that wheeze types of sounds occur over the frequency beyond 400 Hz, whereas Rhonchi occur at a frequency at about 200 Hz. The wheeze can be classified into monophonic (single frequency) and polyphonic (indefinite frequencies). Wheezes can be either high or low pitched and some of the disorders correlated with wheezing sounds are pneumonia, asthma and bronchitis. On the other hand, Crackles (coarse and fine) are irregular abnormal sounds induced by the rapid split of collapsed small air passage and are observed in patients affected with diseases like pneumonia, fibrosis, and heart failure.

The statistics collected from the surroundings is mostly non-linear in description, and therefore traditional methods cannot be used to devise analytical models. In the past decade, intelligent systems were used to figure out this issue but resulted with high error rate [7]. With the help of deep learning algorithms, the error rates can eventually become negligible as it handles enormous amounts of unorganized data [8,9]. Several investigations in this field strive to produce better illustrations and generate prototype to study from data without labels [10]. Deep learning is a framework used for training neural networks and it is examined to be deep if the input data is passed through a sequence of nonlinear transformations using various model architectures [11].

With deep learning models, features can be naturally learned and classified by feeding the raw data instantaneously into a deep neural network. The major deep learning architectures are Unsupervised Pre-trained networks (UPNs), Convolutional Neural Networks (CNNs), recurrent and recursive neural networks and have been applied in the areas including audio and speech signal processing and natural language processing. Among these, the Convolutional Neural Network architecture is a well-known and commonly used network for classifying images. In this paper, to extract the visual details from the pixel values of lung sound images and for accurate classification and detection, a scalogram based optimized Alexnet pre-trained Convolutional Neural Network model is developed.

The remainder of this paper is organized in this fashion: related work on the classification of respiratory sounds using different approaches is introduced in Section II; in Section III, the proposed prediction model is described, the database description and the numerical results are shown in Section IV; and at the end outcomes are given in Section V.

Section snippets

Related work

In the literature, several efforts have been reported for classifying the normal and abnormal lung sounds. They are broadly categorized based on different time-frequency transforms, disparate set of features and various classification methods. Some of the efforts on classification algorithms using extracted features are as follows: In [12], lung sounds were classified using Multi-Layer Perceptron (MLP) by employing Fourier transform for feature extraction, however, only wheezes were identified.

Proposed methodology

The framework of the proposed prediction model is shown in Fig. 1. The proposed approach transforms the segmented respiratory sound into Bump and Morse scalograms and several intrinsic mode functions using the Empirical mode decomposition method. From the extracted intrinsic mode functions, the percentage energy calculated for each wavelet coefficient in the form of scalograms are input to the pre-trained optimized convolutional neural network for training and testing. Stochastic gradient

Results and discussion

The scalograms extracted from the audio files through continuous wavelet transforms and EMD technique were trained for different iterations through pre-trained Alexnet with different epochs 2,4,8,16 & 20. Two optimization methods, the Stochastic Gradient Descent with Momentum and Adaptive Moment estimation were used for training. The experimental settings for modeling the network and Alexnet architecture are listed below in the Table 1, Table 2.

The training loss and test accuracy curves versus

Conclusion

In this paper, a prediction model with Alexnet pre-trained Convolutional Neural Networks using bump and morse scalograms created from IMFs extracted by the method of EMD is proposed. EMD is chosen as the preferred domain in this work as this decomposition technique by the virtue of its adaptive nature, treats the entire signal components in an unbiased manner irrespective of the pattern of basis function. Experimental results show that scalograms derived from IMFs of EMD, when given as input to

Declaration of Competing Interest

NIL.

References (31)

  • N. Meslier et al.

    Wheezes

    Eur Respir J

    (1995)
  • Y. Bengio et al.

    Representation learning: a review and new perspectives

    IEEE Trans Pattern Anal Mach Intell

    (2013)
  • ...
  • Zahangir Alom Md et al.

    The history began from AlexNet: a comprehensive survey on deep learning approaches

    Computer Vision and Pattern Recognition

    (2018)
  • B.D.C.N. Prasadl et al.

    An approach to develop expert systems in medical diagnosis using machine learning algorithms (asthma) and a performance study

    International Journal on Soft Computing (IJSC)

    (2011)
  • Cited by (61)

    • A quality detection method of corn based on spectral technology and deep learning model

      2024, Spectrochimica Acta - Part A: Molecular and Biomolecular Spectroscopy
    View all citing articles on Scopus
    View full text