1 Introduction

Electrochemical events occur in cells of the body. Ionic currents are formed as a result of these events. Electrodes are used to acquire these signals (Lewandowski et al. 2012; Nazarahari et al. 2015). The signals received from these electrodes are used in the areas such as disease diagnosis after processing. These signals must be examined by specialists in the field or automatically classified by various signal processing and machine learning techniques (Melgani and Bazi 2008). For example, electrocardiogram (ECG) signals are produced by the heart muscles during the heartbeat. In the electrical conduction of the heart, sinus node, bundle of atrioventricular node, bundle branches, purkinje fibers are activated (Acharya, 2017; Fajar et al. 2012; Pasolli and Melgani 2010). With the contraction and relaxation of the heart, these systems work in a functional way to generate electrophysiological signals. These signals are used to examine the electrical activity of the heart (Huanhuan and Yue 2014; Taji et al. 2017).

ECG signals in the body are measured through sensors to detect the underlying heart conditions. ECG potential spreads to the entire heart with special transmitting cells in the heart (Alexakis et al. 2003; Edla et al. 2014). It is also known as the recording of the electrical potentials that occur during the stimulation of the heart and the transmission of the stimulus, caused by electrical currents radiating from the heart to the whole body and perceived from certain points on the body surface. ECG in our body is a common tool for the detection of cardiac abnormalities (Satija et al. 2018). ECG based diagnosis and treatment of cardiac diseases, automatic monitoring and diagnosing systems are widely used. Various signal processing techniques are used for this purpose (Daamouche et al. 2012; Jiang and Kong 2007). There are many studies in the literature on the automatic identification and classification of ECG signals (Jagannath et al. 2020; Sangaiah et al. 2020; Shrivastav et al. 2020).

1.1 Motivation

Recently, a common choice was the deep learning based techniques for the heart arrhythmia classification. Moreover, atrial fibrillation detection, ECG based arrhythmia detection (Shashikumar et al. 2017), premature ventricular contraction beat detection (Jun et al. 2016), myocardial infarction detection (Zhou et al. 2016), ECG based disease identification and comparison with the healthy subjects have been effectively analyzed using deep neural networks. Learning features based on deep learning algorithms and convolutional neural networks (CNNs) created an additional improvement for effective ECG signal classification. Feature extraction using CNN architectures, multi-scale deep feature learning (Zhou et al. 2016) recurrent neural networks, stacked sparse auto-encoder (Zhang et al. 2017), Boltzmann machine (Wu et al. 2016) and deep belief networks (Yan et al. 2015) have aimed to effectively analyze and classify the ECG signals. Classification of multiple arrhythmias utilizing ECG signals remains a challenge as the ECG signals related to many arrhythmias represent similar characteristics that need complicated analysis methods for the classification employing traditional signal processing approaches (Hasan and Bhattacharjee 2019).

There are many intelligent ECG signal classification methods in the literature. Some studies on ECG in the literature are presented in Table 1.

Table 1 Some studies on ECG in the literature

Nowadays, most of the researchers have been used deep learning methods to achieve high classification accuracies (Krizhevsky et al. 2012). Our main motivation is to create a hybrid ECG signal classification framework that includes handcrafted feature extraction, feature selection and deep learning. Graph theory is also very effective to solve machine learning problems. We also aimed to show the success of the graph theory in the ECG signal classification. In this study, we introduced a deep learning based multiple heart arrhythmia detection approach in which the features are extracted using tower graph decomposition with hexadecimal local adaptive binary pattern (HLABP). ReliefF (Robnik-Šikonja and Kononenko 2003) and NCA (Raghu and Sriraam 2018) based iterative and hybrid feature selection approach is used for feature selection. Then the informative features are utilized in a deep neural network (DNN) model to learn the inherent features related to each type of ECG signal.

1.2 Contributions

In this paper, a new ECG signal classification approach is developed with high classification accuracy. Our proposed method consists of tower graph decomposition feature extraction with hexadecimal local adaptive binary pattern (HLABP), RFINCA based feature selection and deep neural network (DNN) classification. As seen from these phases, we used handcrafted feature extraction, hybrid-iterative feature selection and DNN together. The contributions of this study are as follows:

  • In the literature, discrete wavelet transform (DWT) (Shensa 1992) and pooling methods (Yaman et al. 2020) have been utilized as decomposition methods to create multilevel or multilayered signal decomposition. We proposed a novel decomposition method by using pooling methods and tower graph together. The obtained results clearly prove the success of the proposed decomposition method. A novel HLABP, which is the improved version of the hexadecimal pattern feature extraction method is presented. ReliefF (Robnik-Šikonja and Kononenko 2003) and NCA (Raghu and Sriraam 2018) are mostly preferred distance based feature selection techniques. We implemented an iterative and hybrid feature selection technique by combining the ReliefF and NCA and called it as RFINCA.

  • By using tower graph decomposition, HLABP, RFINCA and DNN, a highly accurate ECG signal classification approach is presented with 17 ECG signal classes.

2 The proposed ECG signal classification framework

General framework of the suggested novel ECG signal classification approach is shown in Fig. 1.

Fig. 1
figure 1

General framework of the proposed novel ECG signal classification approach

As seen in Fig. 1, the proposed tower graph and HLABP based approach includes four main steps, which are signal decomposition with tower graph (preprocessing), feature generation from each node using the developed HLABP (feature extraction), RFINCA based feature selection, and classification with DNN. The proposed tower graph-based automated ECG signal classification approach is explained step by step below.

2.1 Tower graph decomposition

As it is known from the literature, pooling methods have been mostly preferred methods for the machine learning field. Advantages of the pooling methods are given as follows: (a) they use basic statistical moments for decomposition, which makes them effective. (b) The computational costs of the pooling methods are low because they use a simple mathematical structure. However, they are not good routers. Therefore, new pooling methods should be implemented. In this work, we used commonly preferred pooling methods (minimum, maximum, average) (Sabour et al. 2017) and tower graph together to present a novel pooling-based decomposition method. The proposed decomposition approach is shown in Fig. 2.

Fig. 2
figure 2

An example of tower graph operator and operations of tower graph decomposition. Five leveled tower graph

In the Fig. 2, i, j and a nodes are minimum pooled, maximum pooled and fused nodes respectively. Steps of the proposed tower graph based decomposition are given in below.

Step 0: Load original signal.

Step 1: Apply minimum and maximum pooling to signal and obtain first two children.

$$i1\left(k\right)=\mathit{max}\left(OS\left(t\right),OS\left(t+1\right)\right),t=\left\{\mathrm{1,2},\dots ,L\right\},k=\left\{\mathrm{1,2},\dots ,\frac{L}{2}\right\}$$
(1)
$$j1\left(k\right)=\mathit{min}\left(OS\left(t\right),OS\left(t+1\right)\right)$$
(2)

where \(OS\) represents original signal.

Step 2: Apply Eqs. (3, 4) to calculate i and j nodes.

$${i}^{K}\left(k\right)=\mathit{max}\left({i}^{K-1}\left(t\right),{i}^{K-1}\left(t+1\right)\right),K=\{\mathrm{1,2},\dots ,nl\}$$
(3)
$${j}^{K}\left(k\right)=\mathit{min}\left({j}^{K-1}\left(t\right),{j}^{K-1}\left(t+1\right)\right)$$
(4)

Step 3: Calculate first fused node.

$$signal=conc\left({i}^{1},{j}^{1}\right)$$
(5)
$${a}^{1}\left(k\right)=\mathit{mean}\left(signal\left(t\right),signal\left(t+1\right)\right)$$
(6)

Step 4: Calculate other fused nodes by using the related nodes and Eqs. (5, 6).

2.2 Hexadecimal Local Adaptive Binary Pattern (HLABP)

HLABP, which is the improved version of the hexadecimal pattern is a one-dimensional local feature extractor. To extract features, signum and ternary functions (Tuncer and Dogan 2020) are used together. It also uses two variable patterns, which are called as center symmetric and linear patterns. The used patterns are shown in Fig. 3.

Fig. 3
figure 3

The used patterns in HLABP

The feature extraction process is implemented by using these patterns which are shown in Fig. 3. We used two binary feature extraction kernels, which are signum and ternary. As seen from Fig. 3, eight relations are used to extract features. Signum function extracts eight bits and ternary function extracts 16 bits for each pattern. We divided extracted bits into non-overlapping blocks with a size of 8-bits. Therefore, 1536 features are extracted by using the proposed HLABP. Mathematical notations of the signum and ternary functions are given below (Tuncer and Dogan 2020).

$$bi{t}^{Signum}\left(i\right)=signum\left(x,y\right)=\left\{\begin{array}{c}0,x-y<0\\ 1,x-y\ge 0\end{array}\right.$$
(7)
$$ter\left(x,y\right)=\left\{\begin{array}{c}-1,x-y<-thr\\ 0,-thr\le x-y\le thr\\ 1,x-y>thr\end{array}\right.$$
(8)
$$bi{t}_{lower}^{Ternary}\left(i\right)=\left\{\begin{array}{c}0,ter\left(x,y\right)>-1\\ 1,ter\left(x,y\right)=-1\end{array}\right.$$
(9)
$$bi{t}_{upper}^{Ternary}\left(i\right)=\left\{\begin{array}{c}0,ter\left(x,y\right)<1\\ 1,ter\left(x,y\right)=1\end{array}\right.$$
(10)

where \(signum(.,,)\) denotes signum function,\(ter(.,.)\) represents ternary function, \(bi{t}^{Signum}\) defines bit which is extracted by signum function, \(bi{t}_{lower}^{Ternary}\) and \(bi{t}_{upper}^{Ternary}\) are lower and upper bits with extracted from ternary function, \(thr\) is threshold value. \(x,y\) are inputs of signum and ternary functions. The extracted bits are converted to decimal values by using Eq. (11).

$$ Decimal\;value = \sum\nolimits_{{i = 1}}^{8} {bit(i) \times 2^{{8 - i}} } $$
(11)

Procedure of the proposed HLAP is also given in Algorithm 1.

figure a

2.3 ReliefF and iterative neighborhood component analysis feature selection method

In the feature selection phase, we used a hybrid and iterative feature selection approach. The used hybrid method uses both ReliefF (Robnik-Šikonja and Kononenko 2003) and NCA (Raghu and Sriraam 2018) together. We used ReliefF to calculated negative weighted features and NCA for the final feature selection iteratively. The main problem of these feature extraction techniques is to select the optimal number of features. This feature selection calculates the weights for each feature. NCA generates positive weights, but ReliefF generates both negative and positive weighted features. We have two main objectives to use this hybrid and iterative feature selection method. These are:

  • Combining the positive effects of both feature selectors.

  • Selecting optimal number of feature automatically.

Since the ReliefF and Iterative NCA are used together, it is called as RFINCA. Steps of the used RFINCA are given as below.

Step 1: Generate ReliefF weights by using ReliefF function.

$${w}^{R}=ReliefF(X,y)$$
(12)

where \({w}^{R}\) represents ReliefF weights, \(ReliefF(.,.)\) is weight generation function of the ReliefF, \(X,y\) are features and target.

Step 2: Select positive weighted features.

$$ positive^{{features}} (t) = X(i)\;and\;t + + ,\;if\;f\;w^{R} (i) > 0 $$
(13)

where \(positiv{e}^{features}\) is selected positive weighted features. \(t\) shows length of the selected positive weighted features.

Step 3: Calculate NCA weights of the selected positive weighted features by ReliefF.

$${w}^{N}=NCA(positiv{e}^{features},y)$$
(14)

where \({w}^{N}\) is NCA weights and \(NCA(.,.)\) is weight generation function of NCA.

Step 4: Sort NCA weights by descending and calculate indices of sorted values.

$$\left[{w}_{sorted}^{N}ind\right]=sort({w}^{N})$$
(15)

where \({w}_{sorted}^{N}\) is sorted NCA weights by descending and \(ind\) defines indices.

Step 5: Select a classification method as a loss value generator. In this article we selected 1-NN classifier, which is one of the simplest classifiers in the literature. To calculate loss value, 1-NN is used with tenfold cross validation (CV). We select a feature range to decrease time cost of the RFINCA. Our range is from 100 to 1000.

$$fea{t}^{m-99}\left(l\right)=positiv{e}^{features}\left(ind(i)\right),i=\left\{\mathrm{1,2},\dots ,m\right\},m=\{\mathrm{100,101},\dots ,1000\}$$
(16)
$$loss\left(m-99\right)=1NN(fea{t}^{m-99},10)$$
(17)

Step 6: Calculate minimum loss value.

$$\left[los{s}_{sorted}endex\right]=min(loss)$$
(18)

Step 7: Select final features by using \(endex\) and \(ind\) values.

$$fina{l}^{features}\left(i\right)=positiv{e}^{features}\left(ind(i)\right),i=\{\mathrm{1,2},\dots ,endex+99\}$$
(19)

2.4 Deep learning techniques

A single hidden layer artificial neural network (ANN) has minimal functionality, so ANN can learn more complicated data by introducing more hidden layers. This is the concept behind deep neural networks (DNN) where every hidden layer incorporates the values in its previous layer starting from network inputs, and learns from more complicated data. Additional feature of DNN is to align more intangible patterns with successive hidden layers till the output layer where the most abstract concepts are learned in terms of the outputs. The concept in DNN is to learn the levels of features by enhancing notion with less human involvement (Bengio 2009). The training problem of ANN with several hidden layers is to backpropagate the error to the first layer by multiplying the derivatives in all layers might be removed by training one layer at a time in DNN (Hinton and Salakhutdinov 2006). The aim of individual layers is to extract the informative and discriminative features in the data employing an auto encoder technique. The overall system is tuned with the training data in DNN with less manual intervention (Alpaydin 2020; Subasi 2019).

There is currently no standard technique for building an optimized neural network with the precise number of neurons and number of layers. Therefore, we empirically create our DNN by carrying out a large range of trials. For every trial, the DNN is manually adjusted by setting the number of learning steps, the activation function, the number of hidden layers, and the number of neurons making up the layer for each hidden layer. The classification accuracy has been tested over the test set for each manual configuration. After this tedious manual process, the best performance in the classification has been achieved with the DNN consisting of two hidden layers, with 75 and 50 neurons respectively. Tangent sigmoid was chosen as activation function.

$$v=\partial (x*w+b)$$
(20)
$$y\text{'}=\frac{{e}^{v}-{e}^{-v}}{{e}^{v}+{e}^{-v}}$$
(21)

where \(x\) is input, \(w\) signifies weights and \(b\) signifies bias. \(\partial \) is activation function (tangent sigmoid), \(y\text{'}\) is evaluated output. To train this network, scaled conjugate gradients optimization method was utilized (Sannino and De Pietro 2018).

3 Results and discussion

3.1 Dataset

MIT-BIH ECG dataset involves 1000 observations with 17 classes. This dataset is a heterogeneous dataset because each class has variable numbers of ECG signal. Each ECG signal has 3600 samples. Class names and number of observations for each class of the MIT-BIH dataset are shown in Fig. 4 (Mark and Moody 1988).

Fig. 4
figure 4

Details of the MIT-BIH dataset

3.2 Experimental setup

We presented a new feature generator and a novel graph-based decomposition for feature generation. In the feature selection, RFINCA is applied, and classification results are calculated using ANN and DNN classifiers. The implementation steps of this framework are given in below.

Step 1: Apply the proposed tower graph decomposition to raw ECG signal. In this phase, we used four level tower graph decomposition, and trial and error method is used to select optimal level of the tower graph decomposition. According to the tests, four level tower graph decomposition achieved the best performance. Number of nodes of the tower graph are calculated by using Eq. (22).

$$N=\frac{L(L+1)}{2}$$
(22)

where \(N\) is number of node of the tower graph and \(L\) is level.

Step 2: Extract features from each node of the tower graph. As seen in Eq. (22), there are ten nodes on the used four levelled tower graph. Feature extraction process is summarized in Eq. (23).

$$features\left(\left(i-1\right)\times 1536+1:i\times 1536\right)=HLABP\left(Nod{e}_{i}\right),i=\{\mathrm{1,2},\dots ,10\}$$
(23)

By using Eq. (23), feature extraction and concatenation phases are implemented together.

Step 3: Normalize features by using min–max normalization.

$$X=\frac{features-min(features)}{\mathrm{max}\left(features\right)-min(features)}$$
(24)

Step 3 is used to effectively implement the used RFINCA feature selection method. Because, RFINCA uses distance based feature selectors.

Step 4: Apply RFINCA to X. In this work, RFINCA selects 142 most discriminative features.

Step 5: Forward the selected 142 most discriminative feature to DNN classifier. DNN classifier is used with tenfold CV.

We also used artificial neural network (ANN) classifier to test the proposed tower graph based method. The used ANN contains input, hidden and output layer. The size of the used hidden layer is 50. In order to update weights and bias of it, scaled conjugate gradient backpropagation is used. Tangent sigmoid function is utilized as an activation function.

DNN is one of the special type of the artificial neural network. In the DNNs, there are more than one hidden layer. The used DNN is a back propagation network. We used two hidden layers and our hidden layers have 75 and 50 neurons respectively.

3.3 Experimental results

MIT-BIH ECG dataset, which has 17 classes (Mark and Moody 1988) was used to obtain numerical results of the proposed approach. To setup the experiments, MATLAB 2019b programming environment is used. Tower graph decomposition method, feature extraction with HLABP and RFINCA based feature selection were implemented by using MATLAB. In the classification phase, ANN and DNN are utilized. The extracted and selected features were utilized as input of the ANN and DNN classifiers. The accuracy of the ANN and DNN were calculated as 95.70% and 97.10% respectively. Confusion matrix of the ANN and DNN classifiers were also shown in Fig. 5.

Fig. 5
figure 5

Confusion matrices of the proposed tower graph and HLAP based ECG signal classification

As seen from Fig. 5, the proposed tower graph-based ECG signal classification framework achieved 97.10% maximum classification accuracy with DNN.

3.4 Discussions

In this paper, tower graph decomposition is applied to raw ECG signals and ten nodes are created by using four level tower graph decomposition. We proposed a novel feature extraction method, which is called as HLABP. HLABP extracts feature from each node and totally 15,360 features were obtained by using HLABP. RFINCA method selects 142 most discriminative features. These 142 features were utilized as input of the ANN and DNN. As shown from results, the proposed tower graph based method achieved 97.10% classification accuracy with 17 classes of ECG dataset. We achieved 100.0% classification accuracy for 10 classes (3rd, 5th, 6th, 8th, 11th, 12th, 14th, 15th, 16th, 17th). The worst classification accuracy that belongs to 13th class was calculated as 81.82% by using DNN. ANN achieved perfect classification for six classes (5th, 6th, 12th, 14th, 16th and 17th). The worst classification accuracy was calculated for 13th class as DNN and it was calculated as 54.54%. ANN and DNN classification accuracies for each classes were listed in Table 2.

Table 2 The obtained classification accuracy of each class by using ANN and DNN

Table 2 clearly demonstrated that DNN achieved 1.4% higher accuracy score than ANN classifier. ANN achieved higher classification rates than DNN only 2nd and 4th classes. These calculated high classification accuracies clearly denote the separable attributes of the features extracted. Boxplot analysis is used to demonstrate statistical attributes of these features schematically.

Figure 6 clearly revealed the separable attributes of the generated features according to classes. These figures have variable shapes (see Fig. 6a–q) and these variability shows the separable attributes of the features extracted.

Fig. 6
figure 6figure 6figure 6figure 6

Boxplots of the features generated according to classes

To show success of the proposed framework, a compaison made with the state-of-the-art in Table 3.

Table 3 Comparison results for 17 classes MIT-BIH dataset

We also compared our method with existing methods. These methods used MIT-BIH ECG dataset and deep learning and optimization based methods (Pławiak and Acharya 2020). Tuncer et al. method (Tuncer et al. 2019) used hexadecimal pattern based classification. The main aim of this method is to achieve higher classification performance. Therefore, we used tower graph decomposition, HLAP and RFINCA together to achieve higher recognition performance. We tested our features by using conventional ANN and DNN and compared them. As it can be seen from Table 3, the proposed tower graph based method achieved 0.7% and 2.1% higher classification accuracy than the best of the other state-of-the-art ECG signal classification methods (Tuncer et al. 2019) by using ANN and DNN respectively. According to the achieved results in the recognition of ECG signals, the followings should be emphasized:

  • In this paper, we presented a novel decomposition method by using the widely known pooling techniques and graph theory together. We showed effectiveness of the proposed tower graph-based decomposition technique with the combination of the deep neural network.

  • A novel hand-crafted feature extraction method called as HLABP was presented in this work. HLABP extracts 1536 features from a signal, and it can be used for other biomedical signals to implement a signal analysis approach. This feature generator extracts salient features with low computational complexity. Since, the time burden of this feature generation function is equal to \(O(n)\).

  • The fundamental problem of ReliefF and NCA is to select optimum number of features automatically. Therefore, we implemented RFINCA feature selection technique to solve optimum number of features selection problem.

  • HLABP and RFINCA based feature generation and selection methods are very effective to create features of the ECG signals. This efficiency is also denoted in Fig. 6.

  • To increase the classification accuracy of the proposed method, any metaheuristic optimization method was not used in this work. The proposed tower graph based cognitive feature extraction and selection approach achieved high classification accuracy by using a conventional classifier (ANN). Moreover, we also used DNN to achieve a higher classification accuracy. ANN and DNN achieved 95.7% and 97.1% accuracies respectively (see Table 2 and Fig. 5).

  • The proposed tower graph, HLABP, RFINCA and DNN based framework outperformed the state-of-the-art (see Table 3).

Limitations:

  • A small sized dataset was used for tests. In near future, new generation pattern based deep models can be presented and this model can be tested on the bigger ECG corpora.

4 Conclusions and future directions

The main aim of this work is to implement a highly accurate multilevel and cognitive ECG signal classification framework. In this framework, we implemented two novel components, which are tower graph decomposition and HLABP feature extraction for the ECG signal recognition. Tower graph decomposition method combines pooling method (minimum, maximum and average) with graph theory. As we know from the literature, pooling methods have been commonly used with deep learning methods but they are not good routers. To increase their routing capability, tower graph decomposition technique is proposed for the ECG signals classification. Tower graph decomposition was applied to the raw ECG signals and nodes were generated subsequently. HLABP extracted 1536 features from each node and 15,360 features were extracted in total by using concatenation. In the feature selection phase, RFINCA (hybrid and iterative) feature selection approach was used and 142 most discriminative and informative features were obtained. These features were utilized as an input of the ANN and DNN and 95.70% and 97.10% classification accuracies were achieved respectively. The proposed tower graph based method was also compared with the state-of-the-art methods and it outperformed all of the state-of-the-art techniques.

Future directions about the proposed framework are given as follows:

  • Tower graph decomposition method can be used in deep networks to create layers.

  • An intelligent ECG based decision support system is developed by using the proposed tower graph-based heart arrhythmia detection.

  • HLABP is an effective feature extractor and it can be used for solving other signal processing problems.

  • RFINCA can be used in other machine learning methods. Especially, feature selection problem of the deep networks can be solved by using the proposed RFINCA.

  • This framework tested on an ECG dataset that contains 1000 ECG signals with 17 categories. The proposed model attained high classification accuracy. Thus, this framework can be used in clinical trials. This model can be trained on a big ECG signal dataset and a new automated heartbeat classification application can be developed. The snapshot of the intended ECG classification framework is shown in Fig. 7.

Fig. 7
figure 7

Clinical application of the presented framework