Introduction

Electrocardiogram (ECG) records a significant amount of relevant information, such as heart health status, heart rate variability, psycho-physiologic status, and so forth. It often serves as one of the main bases for diagnosing cardiac diseases [1]. Long-term and dynamic monitoring of early heart diseases or sudden heart attacks can capture transient, non-sustained, abnormal ECG changes that are critical for diagnosing heart diseases, evaluating therapeutic effects, and saving lives [2]. Therefore, long-term routine ECG monitoring is a vital way for diagnosing and controlling heart diseases [3]. Since wearable ECG monitoring systems enable long-term, dynamic, unobtrusive, and unrestrictive ECG monitoring to use gel-free fabric dry electrodes, they are suitable for ubiquitous ambulatory ECG monitoring. In particular, wearable single-lead ECG monitoring systems, e.g. chest straps, vests, and E-bra, have attracted widespread attention due to their simple structures and comfort of wear [4,5,6].

The major problem with wearable ECG monitoring systems is motion artifacts (MA), which significantly affect system stability and reliability [7, 38,39,40,41]. Therefore, MA reduction is one of the most challenging problems in filtering and processing physiological signals, when portable or wearable devices mainly collect the signals. The main difficulty in MA reduction is that the amplitude of it is often much larger than that of the ECG signal. Hence, the spectrum of MA overlaps most of the spectrum of the ECG signal. For example, the frequency of MA caused by the daily body movement is usually within the range of 0.1–7 Hz, which has a deleterious effect on the ECG signal [8, 9]. Especially, in the case of strenuous exercise, the frequency of MA generated by people can be as high as even 20 Hz. The spectrum of MA almost fully overlaps with that of the ECG signal, and the amplitude of MA is several times or even several orders of magnitude larger than that of the ECG signal, which causes the ECG signal to be completely not separable in MA [10]. Another difficulty in MA reduction is that it consists of signals from multiple unknown sources, which makes it difficult to remove the MA from the ECG signal via the traditional filtering methods.

Current researches on algorithms for removing MA in wearable ECG monitoring systems mainly include adaptive noise cancellation (ANC) [11,12,13,14,15,16] and traditional blind source separation [independent component analysis (ICA) and principal component analysis (PCA)] [17,18,19,20,21]. The abovementioned algorithms are suitable for multi-channel observation data, which requires more sensors and their conditioning circuits thus resulting in additional power-consuming, more complex structure, and uncomfortable wearing experience. Therefore, they are not suitable for the single-channel observation data collected by single-lead ECG monitoring systems, e.g. chest strap, E-bra, and vest. The single-channel measurement signals collected by wearable ECG monitoring systems can be categorized as two independent source signals, namely ECG signals and MA since they are generated from different physical processes, which hence are called statistically independent. Our task in this paper is to un-mix contribution sources for having a closer look at the signals of interest. In the case of single-channel observation data, the problem can be effectively resolved by decomposing the given signals into different source signals through single-channel blind source separation (SCBSS) techniques.

At present, there are mainly three types of methods for single-channel blind source separation: sparse decomposition-based method, filtering decomposition-based method, and virtual multi-channel method [33,34,35,36,37]. The virtual multi-channel approach has better performance among them. When combined with the classical ICA algorithm, it directly transforms the single-channel mixed signals into multi-channel ones without relying on the prior probability characteristics of the modulation mode of the signals. The key step of the method is to design a virtual multi-channel approach to map the single-channel blind source signals to a suitable multi-dimensional space and to separate effectively the signals in the multi-dimensional space under the condition that spectrum-overlapped, non-linear, non-stationary and other factors are coupled to each other. Davies et al. [22] proposed a single channel-independent component analysis (SCICA), which obtained multiple channels by delaying the epileptic EEG signal and the maternal ECG signal. They then used an ICA algorithm to extract epileptic signals and fetal ECG. Their work proved the feasibility of the method. However, it was still impossible to separate the maternal and fetal ECG signals effectively since they have an overlapping frequency. Lin et al. [23] and Hong et al. [24] utilized wavelet-ICA (WICA) technology to extract the mechanical fault characteristics of rolling bearing. The results showed that the method could effectively extract the fault characteristics of the single-channel vibration signals. Bogdan et al. [25] proposed a single-channel signal decomposition method, namely, the EEMD-ICA, which combined ensemble empirical mode decomposition (EEMD) and the ICA. The separation performance of the EEMD-ICA, the SCICA, and the WICA was compared. The results showed that the EEMD-ICA outperformed the other two methods, especially in the case of high noise-to-signal ratios (NSR).

The performance of the EEMD-ICA was also validated using two practical applications for single-channel EEG and EMG, respectively. Guo et al. [26] proposed an improved EEMD-PCA-ICA decomposition method, which leveraged the correlation between the mixed-signal and the separated signal as well as PCA to reduce the component of the EEMD decomposition and then conducted an ICA processing. The signal separated by their method had a high correlation with the source signal, including a lower relative root mean square error (RRMSE). Their approach could also extract the sEMG signal from mixed ECG and EMG signals. From the above analysis, we can see that although both wavelet-ICA and EEMD-ICA are applicable to separate frequency-overlapping signals, the wavelet-ICA is not adaptive and severely affected by human factors. The EEMD-PCA-ICA combines the advantages of the above algorithms, with features such as adaptive, dimensionality reduction, and automatic. Since the algorithms mentioned above belong to the SIMO (single input multiple output), they still need manually selected signals for extracting a particular feature or a specific source from the source signal.

In this paper, we propose an algorithm named CEEMDAN-IMFx-PCA-CICA for MA reduction in ambulatory ECG signals using a single-channel blind source separation technique based on the existing algorithms and the prior knowledge of ECG periodicity, QRS complex and principle feature information distribution. We start with decomposing the empirical mode of the complete ensemble using the adaptive noise (CEEMDAN) algorithm to transform the mixed signals into intrinsic mode function (IMF) containing different source signal features. Therefore, a new multi-dimensional signal is formed. By using the correlation between IMFx (IMF component with the most ECG features) and each IMF, PCA is then applied to reduce the dimension of each IMF. Hence, the blind separation of the source ECG signals is realized by using the constrained independent component analysis (CICA) algorithm with IMFx as the constraint reference component. Compared with the single-channel blind source separation algorithms mentioned above, our algorithm can achieve adaptive decomposition, adaptive dimensionality reduction, and fully automatic extraction of ECG signals without artificial selection. To verify the performance of this algorithm, a simulation experiment was conducted to extract ECG signals from single-channel mixed signals with ECG and MA signals. The results indicate that the proposed algorithm outperforms the CEEMDAN-CICA (without dimension reduction), the CEEMDAN-PCA-CICA (with PCA dimension reduction) and improved CEEMDAN-PCA-CICA (using the correlation between mixed signals and each IMF component, and PCA dimension reduction), denote it as the CEEMDAN-Mix-PCA-CICA), especially in the case of high NSR. Besides, the separated ECG signal has a higher correlation with the source ECG signal and a lower RRMSE. The algorithm is also utilized to extract ECG signals from single-channel mixed signals collected by an ECG monitoring chest straps worn on volunteers. The results showed that our algorithm could separate ECG signals more efficiently when compared with the other algorithms.

The proposed algorithm

Our algorithm first performs the CEEMDAN processing on the input single-channel mixed signal to attain the multi-channel IMF. Then, it extracts the principal components of the multi-channel IMF by using the PCA algorithm combined with correlation. After the dimension reduction, the out signal is processed by the CICA to extract the desired source signal. Here is the pipeline of the proposed method.

Complete ensemble empirical mode decomposition with adaptive noise (CEEMDAN)

In 1998, Huang proposed a new method for analyzing non-linear and non-stationary time series [27]. The technique could adaptively decompose signals step by step using IMF components of data series with different data characteristic scales and was called empirical mode decomposition (EMD). The IMF must satisfy two conditions: (i) the extremes (maxima and minima) and the number of zero-crossings must be equal or differ at most by one; (ii) the local mean (the mean of the upper and lower envelopes) must be zero. However, the EMD still had many problems such as mode mixing, residual noise in modes, and the diversity in the number of modes caused by different realizations after adding noise. To solve the problems, Torres et al. [28] proposed a variation of the EEMD, called the CEEMDAN (complete ensemble empirical mode decomposition with adaptive noise). The general idea of the algorithm is as follows:

If \(E_{k}\)(·) is an operator that generates the kth mode through the EMD, and \(w^{\left( i \right)} \left( t \right)\) is a realization of the white noise with zero mean unit variance.

Step 1: To add a realization \(\beta_{0} w^{\left( i \right)}\)(t) of white noise to the target signal x(t) to be decomposed, i.e.

$$ x^{\left( i \right)} \left( t \right) = x\left( t \right) + \beta_{0} w^{\left( i \right)} \left( t \right),i = 1, \ldots , \, I $$
(1)

For each i = 1,…, I, according to the EMD decomposition formula (1), the first mode \({\text{IMF}}_{1}^{i}\) of each realization of adding noise to the signal is obtained respectively, and the ensemble first mode is obtained by averaging.

$$ \widetilde{{{\text{IMF}}_{{1}} }} = \frac{1}{I}\mathop \sum \limits_{i = 1}^{I} {\text{IMF}}_{1}^{\left( i \right)} . $$
(2)

Independent noise realizations occur, the only first residual is obtained:

$$ r_{1} \left( t \right) = \, x\left( t \right) - \widetilde{{{\text{IMF}}_{1} }}. $$
(3)

Step 2: To decompose using the EMD and calculate the ensemble second mode, i.e.

$$ \widetilde{{{\text{IMF}}_{{2}} }} = \frac{1}{I}\mathop \sum \limits_{i = 1}^{I} E_{1} \left( {r_{1} \left( t \right) + \beta_{1} E_{1} \left( {w^{\left( i \right)} \left( t \right)} \right)} \right). $$
(4)

Step 3: To calculate the kth mode and kth residual according to the following formulas:

$$ r_{k} \left( t \right) = r_{k - 1} \left( t \right) - \widetilde{{{\text{IMF}}_{k} }},k \, = \, 2, \ldots , \, K, $$
(5)
$$ \widetilde{{{\text{IMF}}_{k} }} = \frac{1}{I}\mathop \sum \limits_{i = 1}^{I} E_{1} \left( {r_{k - 1} \left( t \right) + \beta_{k - 1} E_{k - 1} \left( {w^{\left( i \right)} \left( t \right)} \right)} \right),k \, = \, 2, \ldots , \, K. $$
(6)

Step 4: To calculate the (k + 1)th mode at last:

$$ \widetilde{{{\text{IMF}}_{k + 1} }} = \frac{1}{I}\mathop \sum \limits_{i = 1}^{I} E_{1} \left( {r_{k} \left( t \right) + \beta_{k} E_{k} \left( {w^{\left( i \right)} \left( t \right)} \right)} \right). $$
(7)

Step 5: To iterate from step 3 to step 4 until the obtained residual cannot be further decomposed by the EMD. That is, either it satisfies the IMF condition or its local extremum is less than 3.

Step 6: To satisfy the ultimate residual:

$$ r_{k}^{ } \left( t \right) = \, x\left( t \right) \, - \mathop \sum \limits_{k = 1}^{K} \widetilde{{{\text{IMF}}_{k} }}. $$
(8)

The ultimate mode number is determined only by the data and the stopping criterion. The coefficient \(\beta_{k} = \varepsilon_{k} {\text{std}}\left( {r_{k} \left( t \right)} \right)\) allows the selection of the SNR at each stage.

Principal component analysis (PCA)

PCA utilizes linear transformation to obtain a new set of features with an equal number of the original features of data. The former part of these features contains the primary feature information of the original data. By doing this, we only take the former part of the features to remain the primary information of the original features, realizing the goal of reducing the number of features as well as reducing the dimension [29]. The general idea of the algorithm is as follows:

Let X = [\(x_{1}\), \(x_{2}\)…,\(x_{m}\)]T be an m × n matrix, where \(x_{i} { }\) is zero-centered, i.e. E (\(x_{i}\)) = 0, i = 1, 2,…,m. where E (∙) is expectation operation, and T is transposition operation.

Step 1: To construct Eq. (9) from X:

$$ R_{x} = E(XX^{{\text{T}}} ),\;R_{x} V = V\Lambda , $$
(9)

where \(R_{x}\) is the autocorrelation matrix of m variables; V is the eigenvector of the m × m matrix \(R_{x}\), and its column vector is the orthogonal normalized eigenvector of \(R_{x}\), Λ is eigenvector diagonal matrix, i.e. Λ = diag(\(\lambda_{1}\), \(\lambda_{2} , \ldots , \lambda_{m}\)), where \(\lambda_{i}\) (i = 1,2,…,m) is the element of the ith diagonal line.

Step 2: To sort \(\lambda_{i}\) in descending order, the amount of signal information is computed by

$$ \mu = \frac{{\lambda_{1} + \lambda_{1} + \ldots + \lambda_{p} }}{{\lambda_{1} + \lambda_{2} + \ldots + \lambda_{m} }}, $$
(10)

where the denominator is the sum of all eigenvalues, and the molecule is the sum of P eigenvalues from 1 to P.

Step 3: To obtain

$$ Y = W^{T} X, $$
(11)

where w = [\(w_{1}\), \(w_{2}\),…,\(w_{p}\)], and \(w_{i}\) denotes the corresponding eigenvector of \(\lambda_{i}\)(i = 1,2,…,m); Y is the matrix after PCA dimension reduction.

Constrained independent component analysis (CICA)

Lu et al. proposed a constrained independent component analysis (CICA) [30, 31], which is similar to the fast-ICA algorithm. Based on the principle of non-Gaussian maximization, the fixed point iteration theory was used to find the maximum non-Gaussian of source signals, and the Newton iteration method was conducted to batch a large number of sampling points of observation variables. Using the prior information of the source signal as additional constraints and maximizing negative entropy as an object function, an independent component similar to the feature of a reference vector was separated from the observed signal. The general idea of the algorithm is as follows:

Step 1: To extract the constrained reference vector r = \(\left( {r_{1} ,r_{2} , \ldots ,r_{n} } \right)^{{\text{T}}}\) with source signal features, n denotes sampling length;

Step 2: To maximize Eq. 12 by assuming that y = \(\omega^{{\text{T}}} X\), where w is a weight vector

$$ {\mathbf{J}}_{{\mathbf{G}}} \left( \omega \right) = \rho \left[ {E\left\{ {G\left( y \right)} \right\} - E\left\{ {G\left( v \right)} \right\}} \right]^{2} = \rho \left[ {E\left\{ {G\left( {\omega^{{\text{T}}} X} \right)} \right\} - E\left\{ {G\left( v \right)} \right\}} \right]^{2} , $$
(12)

where ρ is a positive constant, and v is a Gaussian random variable with a zero-mean and unit variance; G is a non-square non-linear function. In practice, the following function is often used.

  1. (i)

    When the source signals are super-Gaussian and sub-Gaussian

    $$ G_{1} \left( y \right) = \frac{1}{{a_{1} }}\log_{2} \cosh \left( {a_{1} y} \right), \, 1 \le a_{1} \le 2. $$
    (13)
  2. (ii)

    When all the source signals are super-Gaussian or require high robustness

    $$ G_{2} \left( y \right) = - \frac{1}{{a_{2} }}\exp \left( {\frac{{ - a_{2} y^{2} }}{2}} \right),\;a_{1} \approx 1. $$
    (14)
  3. (iii)

    When source signals are all sub-Gaussian

    $$ G_{3} \left( y \right) = \frac{{y^{4} }}{4}. $$
    (15)

Step 3: To transform the objective function into:

$$ \left\{ {\begin{array}{*{20}c} {\max } & {{\mathbf{J}}_{{\mathbf{G}}} \left( \omega \right) = \rho \left[ {E\left\{ {G\left( y \right)} \right\} - E\left\{ {G\left( v \right)} \right\}} \right]^{2} } \\ {{\text{s.t.}}} & {h\left( \omega \right) = {\text{E}}\left\{ {y^{2} } \right\} - 1 = 0} \\ \end{array} } \right.. $$
(16)

The iterative formula of the CICA algorithm is obtained by solving the optimal solution of the objective function by the Newton method:

$$ \omega_{k + 1} = \omega_{k} - \eta R_{xx}^{ - 1} L_{{\omega_{k} }}^{^{\prime}} /\delta \left( {\omega_{k} } \right), $$
(17)

where k is the number of iterations; η is learning rate, \(R_{xx}\) is the covariance matrix of the observation matrix X, \(L_{\omega }^{^{\prime}}\) is the first derivative of Lagrange function formula to ω:

$$ L_{\omega }^{^{\prime}} = \rho E\left\{ {XG_{y}^{^{\prime}} \left( y \right)} \right\} - \frac{1}{2}\mu E\left\{ {Xg_{y}^{^{\prime}} \left( \omega \right)} \right\} - \mu E\left\{ {Xy} \right\}, $$
(18)
$$ \delta \left( \omega \right) = \rho E\left\{ {XG_{yy}^{^{\prime\prime}} \left( y \right)} \right\} - \frac{1}{2}\mu E\left\{ {Xg_{yy}^{^{\prime\prime}} \left( \omega \right)} \right\} - \lambda , $$
(19)

where \(G_{y}^{^{\prime}} \left( y \right)\) and \(G_{yy}^{{^{\prime\prime}}} \left( y \right)\) are respectively the first and second derivatives of the non-quadratic function G(y) to y, \(g_{y}^{^{\prime}} \left( \omega \right)\) and \(g_{yy}^{{^{\prime\prime}}} \left( \omega \right)\) are respectively the first and second derivatives of the non-quadratic function g(ω) to y, g(ω) = ε(y, r) − ζ, where ζ is threshold value and the constraints commonly use mean square error, that is:

$$ \varepsilon \left( {y, \, r} \right) = E\left\{ {\left( {y - r} \right)^{2} } \right\}. $$
(20)

The updated Lagrange factor formula is:

$$ \mu_{k + 1} = \max \left\{ {0, \mu_{k} + \gamma g\left( {\omega_{k} } \right)} \right\}, $$
(21)
$$ \lambda_{k + 1} = \lambda_{k} + \gamma h\left( {\omega_{k} } \right). $$
(22)

Step 4: To obtain the independent component

$$ y = \omega^{{\text{T}}} X. $$
(23)

Algorithm of the proposed method

The implementation of the CEEMDAN-IMFx-PCA-CICA algorithm is as follows:

Step 1: To initialize the quantity of noise and add an independently randomly distributed zero-mean additive white noise to the single-channel mixed signal.

Step 2: To apply the CEEMDAN to obtain a set of IMFs.

Step 3: To perform spectrum estimation on the IMF components, take the IMF component whose frequency range is closest to the QSR group of the source ECG signal and the main feature distribution, and denote it as IMFx.

Step 4: To calculate the correlation between the final IMF and the IMFx by Eq. (24), and take the IMF component whose correlation threshold is greater than a certain threshold:

$$ \rho_{{xz_{j} }} = \frac{{{\text{cov}} \left( {{\text{IMFx}} \cdot {\text{IMF}}_{j} } \right)}}{{\sqrt {D\left( {{\text{IMFx}}} \right)} \sqrt {D\left( {{\text{IMF}}_{j} } \right)} }}. $$
(24)

Step 5: To compose the input matrix of the retained IMF components from step (3), and PCA dimension reduction is carried out. Then, more than 90% of the information is obtained by Eq. (10).

Step 6: To take the reference constraint vector IMFx, and perform the CICA algorithm on the dimension-reduced IMF matrix. Then, the desired separated signal is obtained.

Performance evaluation of the standard single-channel blind source algorithm

For two signals, denoted by a(t) and b(t), the mixed method is presented as follows:

$$ x\left( t \right) = \, a\left( t \right) + \lambda b\left( t \right), $$
(25)

where λ is a real coefficient, and x(t) is a mixed signal.

(1) Correlation coefficient

The inter-signal correlation coefficient is used for evaluating the effect of the single-channel blind source separation method. The correlation coefficient between the recovered signal Y and source signal S is defined by:

$$ \rho_{YS} = \frac{{\cos \left( {Y, S} \right)}}{{\sqrt {D\left( Y \right)} \sqrt {D\left( S \right)} }} = \frac{{\mathop \sum \nolimits_{i = 1}^{N} \left( {Y_{i} - \overline{Y}} \right)\left( {S_{i} - \overline{S}} \right)}}{{\sqrt {\mathop \sum \nolimits_{i = 1}^{N} \left( {Y_{i} - \overline{Y}} \right)^{2} } \sqrt {\mathop \sum \nolimits_{i = i}^{N} \left( {S_{i} - \overline{S}} \right)^{2} } }}, $$
(26)

where \(\rho_{YS}\) is a correlation coefficient, 0 ≤ \(\left| {\rho_{YS} } \right|\)≤ 1, and when \(\left| {\rho_{YS} } \right|\)≥ 0.8, it is then called highly correlated, N is the number of the sampling points of the signal.

(2) Noise to signal ratio (NSR)

A measurement method for two mixed signals is defined as the noise and signal ratio (NSR); the formula is given as follows [26]:

$$ {\text{NSR}} = \frac{{{\text{RMS}}\left( {\lambda b\left( t \right)} \right)}}{{{\text{RMS}}\left( {a\left( t \right)} \right)}}, $$
(27)

where RMS(∙) is the root-mean-square calculation.

(3) Relative root mean square error (RRMSE)

Relative root mean square error is used for judging the effect of recovered signals given as follows [26]:

$$ {\text{RRMSE}} = \frac{{{\text{RMS}}\left( {a\left( t \right) - \hat{a}\left( t \right)} \right)}}{{{\text{RMS}}\left( {a\left( t \right)} \right)}} \times {1}00\% , $$
(28)

where RMS(∙) is the root-mean-square calculation.

Experiments

To evaluate the performance of the proposed method called the CEEMDAN-IMFx-PCA-CICA in removing the MA noise from the mobile ECG single-channel signal, we performed the following two experiments. In the first experiment, ECG signals were extracted from mixed signals with MA. The second experiment was conducted to extract clean ECG signal from the mixed signals, which were collected by an ECG strap monitoring system recording volunteers' daily activities, like walking and running in place. The programs were written in MATLAB 2017, where the EMD function uses the EMD toolbox (https://perso.ens-lyon.fr/patrick.flandrin/emd.html).

Hybrid simulation experiment of ECG signal and motion artifacts

To simulate the dynamic characteristics of MA, the pure ECG signal, a(t) in real life, and the MA, b(t) are linearly mixed according to Eq. (21) with different NSR values to form single-channel data as observed data. Then, the single-channel data is separated by blind source separation. Pure ECG signal comes from MIT-BIH (https://www.physionet.org/cgi-bin/atm/ATM) database records. MA comes from data recorded by our wearable chest straps while volunteers mimicking daily activities shown in Fig. 1. One channel consists of two fabric-made working electrodes for ECG signal acquisition; the other channel consists of an improved fabric electrode (as an auxiliary electrode) and a fiber resistance across the two electrodes to collect MA signals [15, 32]. The MA shown in Fig. 3 was from a volunteer simulating running. The sampling frequencies of both signals were 360 Hz. Hence, both signals were 10 s in length. The main frequency range of the clinical standard ECG signal is 0.05–100 Hz, and the main frequency range of MA of ambulatory ECG is 0–20 Hz. Therefore, they partly overlap each other in frequency. To facilitate comparison, the MA was first linearly transformed into ECG signals of the same order of magnitude by Min–Max normalization.

Fig. 1
figure 1

The steps of the proposed method to extract the source signal

The MA and the pure ECG signals were linearly mixed (NSR = 0.2 and NSR = 5 respectively) to attain the single-channel observation data which served as the input signals for the CEEMDAN. As shown in Fig. 2, where the first line is pure ECG signals; the second line is the MA collected via the fabric dry electrodes as the volunteer wearing the chest strap is running; the third line is the mixed signals, and the fourth line is the separated ECG signals. From the third line, it can be found that when the NSR is 0.2, the complete ECG waveform (Fig. 2a) can be seen from the mixed signal. However, when the NSR is 5, the ECG signal is completely submerged in MA noise, and the ECG waveform is almost invisible (Fig. 2b).

Fig. 2
figure 2

Measurement system diagram. a Wearable chest strap. b Measurement system

The observed mixed signals were decomposed using the CEEMDAN to attain the multi-channel IMF, where the variance of white noise was 0.1, and the set size was 100. Afterward, the power spectrum of each IMF component was evaluated by a modified covariance method. The mixed-signal of NSR = 0.2 was decomposed by the CEEMDAN to attain 12 channels of data. Figure 3a demonstrates the mixed signal (IMF 1–11 is the obtained component and res is the margin). Figure 3b illustrates the spectrum curve of the corresponding component. We can see that the primary information of the ECG signal is distributed in IMF 1–6. Mainly, the primary information of the QRS complex is distributed in IMF 2–5. The corresponding frequency is mainly in the 5–40 Hz. As shown in Fig. 4a, the mixed-signal (NSR = 5) is decomposed by the CEEMDAN, and the 9-channel data are obtained. Figure 4b shows the spectrum curve of the corresponding component. It can be seen that the main information is distributed in IMF 1–5. Notably, the main information of the QRS complex is distributed in IMF 2–3. The corresponding frequency is mainly in the 5–40 Hz.

Fig. 3
figure 3

Source signal, mixed signal and separated ECG (a NSR  =  0.2, b NSR  =  5)

Fig. 4
figure 4

The IMF component and spectrum estimation obtained by CEEMDAN (NSR = 0.2)

After the CEEMDAN decomposition, each IMF component kept parts of information of the mixed-signal. However, how much data of the ECG signal would be kept was determined by correlation analysis. The components with very low correlation with the ECG signal were removed. Based on the prior knowledge of the source ECG signal (the approximate periodicity of the salient feature QSR complex (frequency range: 5–15 Hz) and the main energy distribution characteristics (frequency range: 1–50 Hz), frequency range: 15–25 Hz here) and the features of the IMF component (single characteristic time scale), the correlation between IMFx and IMF components was selected. The correlation between mixed signals and IMF components was compared. Figure 5 demonstrates the correlation between IMF components and mixed signals calculated by Eq. (22). Seen in Figs. 3a and 5a that the main components of the ECG signal can be retained to the greatest extent by taking the components of IMF whose correlation with mixed signals is greater than 0.2, and the dimension can be reduced from 13 to 9. However, from Figs. 4b and 5b, when NSR = 5, it is impossible to both reduce the dimension and retain the main component of the ECG signal. From Figs. 3, 4 and 5c, d, when NSR = 0.2 and NSR = 5, the main components of ECG signal can be retained to the greatest extent if the correlation between IMF components and IMFx components is greater than 0.01, and the dimension can be respectively reduced from 13 to 9 and from 11 to 5. To sum up, for mixed signals with low NSR (NSR = 0.2) in both time and frequency domain, the correlation between mixed signals and IMF components can be used to reduce dimensionality, while for mixed signals with high NSR (NSR = 5) the correlation cannot be used to reduce dimensionality. However, both low and high NSRs can be reduced by the correlation between IMF components and IMFx components (such as component IMF3) whose frequency range is closest to the QRS frequency range.

Fig. 5
figure 5

The IMF component and spectrum estimation obtained by CEEMDAN (NSR = 5)

After the first dimension reduction, the residual IMF components were then processed by traditional PCA. This process could save time, improve the decomposition rate, and reduce the iteration times of the CICA. The amount of information is calculated through Eq. (10), and the value is 99%. When NSR = 0.2, the dimension decreased from 12 to 7, and when NSR = 5, the dimension decreased from 9 to 4. After the PCA processing, part of the correlation was removed, but the correlation still exists in the high-order case. The signal was completely independent after the CICA processing. Finally, the useful source signal ECG was obtained as shown in Fig. 2. The correlation of IMFx components is used to reduce dimension. The results indicate that when NSR = 0.2, the ECG signal can be recovered very well, and the correlation with the source ECG signal reaches 0.9849, and the relative mean square error is 0.1728. When NSR = 5, the ECG signal, which was submerged by the noise, can also be recovered. The correlation is 0.6094, and the relative mean square error is 0.8325. Although some details are lost, the QRS complex can still be identified, which can be used in heart rate recognition of wearable products, HRV, and other applications.

To explore further, the performance of extracting ECG signals (removing MA) from the mixed signals was compared among the CEEMDAN-IMFx-PCA-CICA and the other three algorithms. In the case of NSR = 0.2–5 and step size = 0.2, 25 sets of blind source separation simulation experiments for ECG signal and MA signal were carried out by using the four algorithms respectively shown in Figs. 6 and 7. The correlation between IMF component and IMFx component is more than 0.01, the correlation between IMF component and a mixed-signal is more than 0.2, and the information content after PCA processing is more than 99%.

Fig. 6
figure 6

Correlation coefficient [NSR = 0.2 (a, c), NSR = 5 (b, d), the correlation between IMF and mixed signal (a, b), the correlation between IMF and IMFx (c, d)]

Fig. 7
figure 7

The relative error and correlation of ECG signal under different NSR conditions (a relative error, b correlation)

The simulation results in Fig. 6 show that with the increase of NSR, the four algorithms generally conform to the following rules: the correlation between the source ECG signal and the separated ECG signal decreases gradually, and the relative mean square error RRMSE increases slowly. When the NSR is lower (NSR < 1), the separation performance of the four algorithms is similar, i.e. the trend and magnitude of correlation and relative error with NSR are almost equal since when the NSR of the MA signal and ECG signal is smaller (NSR < 1 used). Hence, the mixed signal is dominated by the ECG signal, and the overall waveform is the ECG waveform. In this case, whether using the mixed signal or using the correlation between the IMFx component and each IMF component to analyze, those components with a high correlation with the ECG can be retained, and components with low correlation are removed. Similarly, the PCA maintains the main features of the original data, so the four algorithms can keep the primary information of the ECG signal, and have similar separation and recovery performance. When NSR is higher (NSR > 2), the CEEMDAN-IMFx-PCA-CICA algorithm has better recovery performance than the other three, and has a higher correlation and lower relative error. The recovery performance of the CEEMDAN-CICA algorithm is worse than that of the CEEMDAN-IMFx-PCA-CICA but better than that of the CEEMDAN-Mix-PCA-CICA and the CEEMDAN-PCA-CICA. The CEEMDAN-Mix-PCA-CICA and the CEMDAN-PCA-CICA have the worst recovery, with lower correlation and higher relative error since when the NSR is higher (NSR > 2), the main component of the mixed signal is that the MA signal is dominant, and the MA signal waveform represents the overall waveform. In this case, if the correlation between the combined signal and each IMF component is used, only those components that have a higher correlation with the MA can be retained, and the components that have a lower correlation with the MA and are highly correlated with the ECG are removed. However, by using the correlation between IMFx and each IMF component, it is possible to retain those components that are highly correlated with ECG and to remove the components that are correlated lowly with ECG though highly correlated with MA. PCA maintains information about the main features of the original data and removes redundant and non-primary features. When NSR is between 1 and 2, the stability of the CEEMDAN-Mix-PCA-CICA and the CEMDAN-PCA-CICA becomes worse since the main components of mixed signals become uncertain under this condition. Then, the correlation between mixed signals and IMF components is used to select which components containing the primary information of ECG signals remain uncertain. Similarly, after PCA processing, the correlation between separated ECG signal and source ECG signal fluctuates greatly, and RRMSE changes greatly shown in Fig. 6. The CEEMDAN-CICA has a more unsatisfactory recovery performance than does the CEEMDAN-IMFx-PCA-CICA since some of the IMF components containing MA are separated from ECG signals during the process of the ICA. It is also possible that the decomposition needs to have more IMF components and more iterations, which could lead to non-convergence.

The CEEMDAN-IMFx-PCA-CICA leads to more dimension reduction that vary about 3–7, and the range of variation is significant that is shown in Fig. 7 (the red line) since the algorithm can automatically adjust the IMF component containing the main information of the ECG to participate in the ICA separation according to the NSR change. Then, the number of dimension reduction for the CEEMDAN-Mix-PCA-CICA and the CEEMDAN-PCA-CICA also reaches 5–6, and the CEEMDAN-CICA do not reduce dimension. After dimension reduction, the number of iterations of the CICA can be reduced, thus saving the running time and increasing stability. The performance of the four algorithms is provided in the discussion section, where NSR = 1. IMF is the number of modal components decomposed by the CEEMDAN; the IC is the number of components before the CICA processing, and the column title called reduced is the number of reduced dimensions. Except for the CEEMDAN-CICA, there is no dimensionality reduction, and the numbers of dimension reduction for the other three algorithms are close. The CEEMDAN-IMFx-PCA-CICA has the least number of iterations. On the other hand, the CEEMDAN-CICA has iterated most times. Both CEEMDAN-Mix-PCA-CICA and CEEMDAN-PCA-CICA have similar iterations and the same dimension reduction.

Real application of CEEMDAN-IMFx-PCA-CICA in ambulatory ECG signals

The real case study aimed to verify that the performance of the proposed algorithm in a real application to remove MA from mobile ECG signal, especially in difficult exercise situations (e.g. running), was obtained where the ECG signal source was contaminated by the high amplitude and high frequency of MA. Our algorithm would be applied to the ECG signals contaminated by MA due to different daily activities. It could demonstrate how our algorithm could extract ECG signals from contaminated ECG signals. We also compared our algorithm with the other three algorithms mentioned above. The single-channel observation signals were obtained from five male volunteers who wore our wearable fabric chest straps (Fig. 2) and imitated daily exercise (walking in place, bending down from standing, enlarging bosom, running in place) during the data collection period. For per person per action, 70 s of data were collected at the sampling rate of 2000 Hz by MP150 (Biopac Systems INC, USA). The first 10 s were stationary data and the last 60 s were data of repeated actions, e.g. walking in place. In this study, five healthy male volunteers whose ages were 20–25 were recruited as subjects. They had a BMI from 16.8 to 25.6 kg m−2.

The respective correlations of the IMF components decomposed from the collected 10-s data using the CEEMDAN with the mixed signals and IMFx are shown in Fig. 8. The correlation of IMF and IMFx components is higher than 0.01; the correlation of each IMF component and the mixed signal is greater than 0.1, and the amount of information after PCA processing is larger than 99%. As shown in Fig. 8, the correlation between IMFx and IMF components can be used to reduce dimension and retain the main information of ECG signals when walking in place and bending down from standing.

Fig. 8
figure 8

The number of dimension reduction of 4 different algorithms under different NSR

However, the correlation between IMFx and IMF components in other movements such as enlarging bosom and running in place can minimize dimension and retain the main information of ECG.

Seen in Figs. 9 and 10 that the proposed algorithm can extract the ECG signal effectively from the contaminated single-channel ECG signal collected in the four different actions. Besides, if the amplitude of the MA is low, after the processing with the CEEMDAN-IMFx-PCA-CICA, MA can be well removed from the mixed signals without distorting the shape and amplitude of the original ECG signal. If the amplitude of the MA is higher, the ECG signal is almost completely submerged in MA, but the complete QRS complex can still be well extracted. Thus, it can be used in heart rate and ECG monitoring. The CEEMDAN-Mix-PCA-CICA can retrieve the ECG signal effectively in a single-channel ECG signal contaminated by MA due to walking (the third line of Fig. 9) and bending down (the sixth line of Fig. 9). However, it fails to extract the ECG signal from the actions of enlarging bosom (the third line of Fig. 10) and running (the sixth line of Fig. 10). In the case of running in place, the QRS complex even cannot be observed.

Fig. 9
figure 9

Correlation coefficient (a walking in place, b bend down (from standing), c enlarge bosom, d running in place)the correlation between IMF and mixed signal (ad upper), the correlation between IMF and IMFx (ad down)

Fig. 10
figure 10

Single-channel mixed signal and separated ECG signal (walking in place, bend down from standing)

Fig. 11
figure 11

Single-channel mixed signal and separated ECG signal (enlarge bosom, running in place)

The improved SNR (ISNR) of the ECG signal is separated employing the four algorithms. It can be seen that that of the ECG extracted by the four algorithms is close to the SNR of ECG obtained by the four algorithms, while the SNR of the ECG extracted by the two separation algorithms is close to that of ECG extracted by the two algorithms when walking in place and bending down. The detailed numerical results are provided in the discussion section. When enlarging bosom and running are conducted, the SNR of ECG extracted by the CEEMDAN-IMFx-PCA-CICA algorithm is higher than that derived by the other three algorithms, which is consistent with the conclusions of the simulation experiment. The CEEMDAN-IMFx-PCA-CICA performs better than the other three algorithms when NSR is higher. While in the case of dealing with lower NSR (mild motion), the four algorithms have similar recovery performance. Noted that, although we only present one volunteer's data in this paper, the other four volunteers' data have identical conclusions.

Discussion

When the simulation study is a concern, the NSR equal to 0.2 for mixed signals leads to the complete ECG waveform. However, when the NSR is 5, the ECG signal is completely submerged in MA noise. Hence, the ECG waveform is almost invisible. The correlation between mixed signals and IMF components can be used to reduce dimensionality, though the correlation cannot be employed to reduce dimensionality for mixed signals with high NSR = 5. However, both low and high NSRs can be reduced by the correlation between IMF components and IMFx components.

After the PCA processing, part of the correlation was removed, but the correlation still exists in the high-order case. The signal was completely independent after the CICA processing. The useful source signal ECG finally was obtained. The correlation of IMFx components was used to reduce dimension. The results indicated that when NSR = 0.2, the ECG signal could be recovered very well, and the correlation with the source ECG signal reached 0.9849 resulting in the relative mean square error, 0.1728. When NSR = 5, the ECG signal, which was submerged by the noise, could also be recovered. However, the correlation was 0.6094, and the relative mean square error was 0.8325.

When the increment of NSR occurred, the four algorithms generally conformed to the following implications: (1) the correlation between the source ECG signal and the separated ECG signal decreased gradually, and the relative mean square error (RRMSE) increased slowly. When the NSR was low (NSR < 1), the separation performance of the four algorithms was similar. Hence, the mixed-signal was dominated by the ECG signal, and the overall waveform was the ECG waveform. (2) The PCA maintained the main features of the original data, so all the four algorithms could keep the primary information of the ECG signal, and had similar separation and recovery performance. (3) When NSR was between 1 and 2, the stability of the CEEMDAN-Mix-PCA-CICA and CEMDAN-PCA-CICA became worse since the main components of mixed signals became uncertain under this condition. (4) When NSR was high (NSR > 2), the proposed algorithm called the CEEMDAN-IMFx-PCA-CICA had better recovery performance than the other three, and had a higher correlation and lower relative error. The CEEMDAN-IMFx-PCA-CICA led to dimension reduction more than did the other methods, which varied about 3–7, and the range of variation was significant since the algorithm could automatically adjust the IMF component containing the main information of the ECG to participate in the ICA separation according to the NSR change.

Table 1 exhibits the performance of the four algorithms when NSR = 1. The proposed algorithm called the CEEMDAN-IMFx-PCA-CICA has the least number of iterations. The CEEMDAN-Mix-PCA-CICA and the CEEMDAN-PCA-CICA have similar iterations and the same dimension reduction.

Table 1 Performance comparison of four algorithms (NSR = 1)

When the real case study is a concern, it is found that the CEEMDAN-Mix-PCA-CICA can retrieve the ECG signal effectively in a single-channel ECG signal contaminated by MA, which is called walking and bending down. However, it fails to extract the ECG signal from the actions of enlarging bosom and running. In the case of running in place, the QRS complex even cannot be observed. When enlarging bosom and running are conducted, the SNR of ECG extracted by the CEEMDAN-IMFx-PCA-CICA algorithm is higher than that derived by the other three algorithms, which is consistent with the conclusions of the simulation experiment. The CEEMDAN-IMFx-PCA-CICA performs better than the other three algorithms when NSR is higher. While in the case of dealing with lower NSR, the four algorithms have similar recovery performance. Table 2 denotes the results of each method concerning four actions.

Table 2 Improved SNR of four algorithms (dB)

Conclusions

In this paper, we introduce a new method called the CEEMDAN-IMFx-PCA-CICA that combines the CEEMDAN with an improved PCA and CICIA. By conducting the CEEMDAN-IMFx-PCA-CICA, the ECG signal can be successfully extracted from single-channel mixed source signals. The results of the real case indicate that the performance of the proposed algorithm is better than the other three mentioned algorithms when NRS is high. One of the advantages of conducting the proposed algorithm is the iteration times of CICIA that are reduced. Besides, the source signal is recovered better. Moreover, the separated ECG signals have a higher correlation and lower RRMSE with the source ECG signal.

As a result, our algorithm, called the CEEMDAN-IMFx-PCA-CICA, can achieve adaptive decomposition, adaptive dimensionality reduction, and fully automatic extraction of ECG signals without artificial selection, performing better than the other three algorithms when NSR is high which is represented by some motion artifacts such as walking, running, enlarging bosom and bending down. Moreover, the proposed method is better than the other methods extracting ECG signals when the actions of enlarging bosom and bending down are a concern, which is superior to the nearest rival.

On the other hand, the four algorithms have similar recovery performance when NSR is low. Even though their performances are the same, the proposed algorithm is better than others when the iteration number is a concern. Noted that some limitations should be mentioned. The two threshold parameters called correlation values and the information amount of PCA in the algorithm are easily influenced by the choice of practitioners, which means that the human factor plays a vital role.

As future researches, the statistical research will be carried out to determine the appropriate correlation value and PCA information value so that the algorithm could reconstruct the original signal more accurately, and would have a better spectrum separation mode and lower calculation cost. Besides, the problem related to human factors affecting the algorithm could be treated by mathematical or statistical methods utilizing subjective approaches better.