1 Introduction

Throughout the world, researchers and engineers are developing intelligent robotic systems for manufacturing, service, and medicine and health care. The application of intelligent robotics systems can increase productivity, safety, and quality of life for people. However, the design and the production of these systems require knowledge of sensors and actuators. Moreover, the intelligent robotic systems required algorithms and methods to process acquired data from sensors and to control operating actuators.

Intelligent robots are designed, in most cases, to replace humans in dangerous and repetitive jobs. For example, many modern industrial robots are exploited in heavy lifting tasks, precision drilling, packing, and sealing.

Service robots have been designed and developed for applications in agriculture, house cleaning, fire fighting, mining, surveillance, military uses, and undersea exploration. Services provided by state-of-the-art robots have one of the greatest potential application of intelligent robotics in the future.

Intelligent robotic systems designed for health care are used in surgery, rehabilitation, and assistance in daily living. For example, medical robots have the potential to increase the quality of medicine and health care enabling human surgeons to treat individual patients with improved efficacy and greater safety. On the other hand, robotic systems for rehabilitation and for helping deal with physical disabilities will become more and more important. The robots can be used to assist with physical therapy, as part of smart prosthetic devices and help disabled people in daily living [1, 2].

One of the main elements of intelligent robotics systems is sensors. Sensors are devices whose purpose is to detect changes in the environment and send the acquired data to the processing unit. The sensing devices may operate under different load conditions and environmental conditions (temperature, humidity). All of these may result in noisy measurements. Noise is a signal distortion hindering the process of data acquisition. Therefore, the processing of sensor data is one of the most important components of intelligent robotics systems [3]. The major element of the signal processing is denoising. Denoising is the process of suppressing the noise from data to the desired extent. From numerous types of noise prevailing in data acquired from electronic devices Additive White Gaussian Noise (AWGN), quantization noise, and bias instability are the most common [4].

There exist several different denoising approaches originating from various disciplines like spectral and multi-resolution analysis, partial differential equations, and statistics. Based on the type of denoising algorithms, they can broadly be referred to as time-domain filtering, transform-domain filtering, and dictionary learning methods.

Time-domain methods include local and non-local filters. These filters exploit the similarities between statistics of different regions in the signal. The main difference between local and non-local approaches is the size of the surrounding regions.

A filter is considered to be local if the filter support for denoising a sample is restricted by time distance. Local methods are effective in terms of time complexity. The popular local filters designed for noise reduction are Gaussian filter [5], Wiener filter [6], Median filter [7], and Savitzky-Golay filter [8].

On the other hand, one of the examples of the non-local algorithm is the Nonlocal Means method (NLM) which is implemented both 1D and 2D signals [9], [10]. The NLM estimation exploits the non-local similarities in the signal to denoise the signal. One of the drawbacks of the NLM denoising scheme suffers from the “rare-patch” effect. It can lead to some morphological dissimilarity between the original and the NLM denoised signals.

In contrast to approaches designed in the time domain, the transform domain methods are based on the property of sparsity.

Sparse property means that the signal can be represented by a fewer number of the non-zero coefficient. One of the examples of approach is the denoising algorithms based on the wavelet transform.

The denoising methods based on the wavelet [11, 12] usually transform the signal into multiple sub-bands at different resolutions and scales. Low-frequency noise exists in larger frequency coefficients and high-frequency noise dominates a lower frequency coefficient. Unfortunately, the wavelet-based methods are sensitive to periodic noise and it is difficult to select the exact value of threshold [13].

Sparse representation of the signal has been widely researched in recent years. The algorithm based on sparse representation tends to offer denoising effects, therefore it is a powerful tool. The basic assumption behind sparsity-based denoising is that the signal can be sparsely represented in a transform domain but the noise cannot. If this assumption holds it is possible to find a threshold in the transform domain that suppresses the dense but small noise and keeps the sparse high-valued signal components.

The other approach is the Empirical Mode Decomposition (EMD) [14]. Unlike the wavelet-based approach, EMD has the advantage of not relying on the predefined basis functions. In the EMD method analyzed signal is decomposed into the set of oscillatory components (Intrinsic Mode Functions, IMF). Empirical Mode Decomposition method is widely used for analyzing non-stationary signals. However, there are many problems to use EMD. One of them is the problem of mode mixing, i.e. the existence of different time scales in a single IMF. Another is that Empirical Mode Decomposition still has not to sound mathematical theory.

Variational Mode Decomposition (VMD) is an adaptive and non-recursive signal analysis method [15]. Compared to EMD, the VMD approach has a strong mathematical background. This helps to improve the accuracy of signal separation and noise robustness [16]. Moreover, Variational Mode Decomposition effectively eliminates the effect of mode mixing during the decomposition process [17]. The set of determined, based on the VMD algorithm, components are also named IMF.

VMD algorithm has been used in many fields. For example, in [18] the authors reported the application of Variational Mode Decomposition in forecasting day-ahead energy prices. While the authors of [19] proposed VMD in the analysis of seismic signals. The discussed method is also used for the enhancement of gyroscopic signals [20].

More recently, in [21] the authors combine the Variational Mode Decomposition method and the LSTM (long short-term memory) network to solve the problem of forecasting the non-ferrous metal price.

In the paper, a sparse signal processing is applied to design a new denoising algorithm based on Variational Mode Decomposition. The general idea of the proposed approach is to apply the Split Augmented Lagrangian Shrinkage Algorithm algorithm to each mode extracted from the signal with the use of the VMD algorithm.

1.1 Contribution

The main contribution of this paper is a new algorithm for signal denoising based on Variational Mode Decomposition and the Split Augmented Lagrangian Shrinkage Algorithm algorithms.

We also compared our proposal with, the Non-local means algorithm, Median filter, Savitzky-Golay filter, and two wavelet-based denoising methods. Verification of the proposed methods was performed based on synthetic signals with the different levels of degradations as well as on the real-world signals acquired from the three-axis gyroscope.

2 Basic principles

2.1 Variational Mode Decomposition

VMD is a multi-resolution analysis method used to extract modes (sub-signals) and their central frequencies from a non-stationary signal.

Each sub-signals extracted from original signal y(t) by applying VMD is given by

$$ u_{k}(t) = a_{k}\cos(\phi_{k}(t)) $$
(1)

where k = 1,2,…,K, K represents the predefined number of the modes, the phase ϕk(t) is a non-decreasing function (\(\phi _{k}^{\prime }(t) \geq 0\)), the envelope is non-negative (\(a_{k}^{\prime }(t) \geq 0\)), and both the envelope ak(t) and the instantaneous frequency \(\omega _{k}(t) = \phi _{k}^{\prime }(t)\) vary much slower than the phase ϕk(t) [15].

When the decomposition is completed, the original signal y(n) is represented by

$$ y(t) = \sum\limits_{k=1}^{K} u_{k}(t). $$
(2)

In the VMD method, the modes are extracted iteratively based on a variational model that aims to minimize the sum of bandwidths of all modes \(\{u_{k}(t)\}_{k =1}^{K}\). The extracted modes have specific sparsity properties while reproducing the input signal. It is assumed that each sub-signal to be mostly compact around a centre frequency [15].

The associated constrained variational optimization problem and the corresponding objective function to calculate modes and their central frequencies is as follows

$$ \begin{aligned} {\{u^{\ast}_{k},\omega^{\ast}_{k}\}} = &\min_{\{u_{k}\},\{\omega_{k}\}}\left\{{\sum\limits_{k=1}^{K}\left\|{\frac{\partial }{\partial t}\left[{\left( {\delta(t)+\frac{j}{\pi t}}\right)\ast u_{k}(t)}\right]\exp^{-j\omega_{k}t}}\right\|_{2}^{2}}\right\}\\ & \quad\quad \text{s.t.} \quad \sum\limits_{k=1}^{K}u_{k}(t)=y(t) \end{aligned} $$
(3)

where t is the time index, δ(t) is the Dirac distribution, uk is shorthand notations for k-th mode from the set of all sub-signals, i.e. uK = {u1, u2,…,uK} and ωk stands for k-th centre frequencies of k-th mode, i.e. ωK = {ω1, ω2,…,ωK}. The symbol ∗ denotes convolution. y(t) represents the signal to be decomposed.

By introducing the Lagrangian multiplier λ and the second penalty factor α (data-fidelity constraint parameter), the constrained variational problem (3) can be transformed into its unconstrained variational equivalent

$$ \begin{array}{ll} L(u_{k},\omega_{k},\lambda)= &\alpha\sum\limits_{k=1}^{K}{\frac{\partial}{\partial t}\left[{\left( {\delta (t)+\frac {j}{\pi t}}\right)\ast u_{k}(t)}\right] \exp^{-j\omega_{k}t}}_{2}^{2}\\&+\underbrace{\left\|{y(t)-\sum\limits_{k=1}^{K} u_{k}(t)}\right\|_{2}^{2}}_{(a)} + \underbrace{\left\langle \lambda (t),y(t)-\sum\limits_{k=1}^{K} u_{k}(t)\right\rangle}_{(b)} \end{array} $$
(4)

The function L() is augmented Lagrange function where (a) and (b) enforce the constraint by imposing a quadratic penalty and incorporating a Lagrange multiplier respectively.

The quadratic penalty provides reconstruction fidelity, typically in the presence of additive i.i.d. Gaussian noise. In turn, the Lagrangian multiplier is used to enforce constraints strictly [15].

In order to solve the unconstrained optimization problem (4) it can be split into two different sub-problems, i.e, minimization of uk(t) and ωk(t). Such separation of the formulated problem (4) is represented as

$$ \begin{array}{@{}rcl@{}} u_{k}^{(l+1)} & \leftarrow \mathop{\arg \min_{u_{k}}} L\left( u_{i<k}^{(l+1)},{u_{i\geqslant k}^{(l)}},{\omega_{i}^{(l)}},\lambda^{(l)}\right) \end{array} $$
(5a)
$$ \begin{array}{@{}rcl@{}} \omega_{k}^{(l+1)} & \leftarrow \mathop{\arg \min_{\omega_{k}}} L\left( {u_{i}^{(l+1)}},\omega_{i<k}^{(l+1)},{\omega_{i\geqslant k}^{(l)}},\lambda^{(l)}\right) \end{array} $$
(5b)

where k = 1,2,…,K, l means the l-th iteration. To solve the optimization problems (5a) and (5b) ADMM (Alternating Direction Method of Multipliers) approach is applied. To this end, the complete optimization problem (4) is splitted into sub-problems (5a and 5b). It is convenient to handle since it allows to minimize cost function iteratively for a single parameter of interest rather than optimizing cost function for all optimization variables simultaneously.

Based on continuous iteration updated, all modes and their corresponded centre frequencies are determined from [15]

$$ \begin{array}{@{}rcl@{}} \displaystyle \hat{u}_{k}^{(l+1)}(\omega)&=&\frac{\hat{y}(\omega)-{\sum}_{i\neq k} {\hat{u}}_{i}^{(l+1)}(\omega)+\frac{{\hat{\lambda}}(\omega)}{2}}{1+2\alpha\left( \omega -\omega_{k}\right)^{2}} \end{array} $$
(6a)
$$ \begin{array}{@{}rcl@{}} \displaystyle\omega_{k}^{(l+1)} &=& \frac{{\int}_{0}^{\infty}\omega\left\vert\hat{u}_{k}(\omega)\right\vert^{2}d\omega}{{\int}_{0}^{\infty}\left\vert\hat{u}_{k}(\omega)\right\vert^{2}d\omega}. \end{array} $$
(6b)

The formula (6a) resulting in the update of k-th mode in the frequency domain. While (6b) is used to get an update of k-th mode’s frequency as the estimation of their center of gravity.

The modes and their corresponding frequencies in the spectral domain are the fundamental components of the VMD algorithm.

The extracted modes obtained through the VMD algorithm have largely disjoint spectral support and are therefore automatically quasi-orthogonal. For more details on the Variational Mode Decomposition algorithm, we refer readers to [15].

2.2 Dictionary adaptation

Dictionary adaptation is a new approach to the enhancement of signal decomposition. The introduced method is closely related to dictionary learning [22], [23] however there are some important differences between them. In a dictionary learning, a typical set-up starts with a training set, which is a collection of training data. Afterward, the training data is used to learn the dictionary begins with the initial set of its elements. The elements of the dictionary are named atoms and they can be any waveform. The learning process should determine the dictionary of atoms that can synthesize the analyzed signal.

In the dictionary adaptation [24] we only have one signal, not a training set. In our case, the atoms of the dictionary are given in the form of \(a\cos \limits (\phi (t))\) (1). As we will demonstrate later, it is possible to get a reasonable adaptation of the dictionary based on the initial set of atoms through optimization algorithms and with only one measurement of the signal.

A dictionary is a collection of parametrized waveforms. In the paper the dictionary is determined based on VMD algorithm:

$$ \mathcal{D} = \{u_{1}(t), u_{2}(t), \ldots, u_{K}(t)\} $$
(7)

where uk(t) are defined such that they are amplitude- and frequency-modulated signals (see eq. 1).

Instead of a VMD-based dictionary, it is possible to consider other dictionaries. One of the widely used is the Fourier-based dictionary which is a collection of sinusoidal waveforms. Another example, is the wavelet dictionary composed of translations and dilations of the basic mother wavelet.

Generally, in the dictionary adaptation approach, we can use any decomposition method that gives a sparse decomposition of signal [25]. Therefore, the signal can be decomposed into a set of sparse sub-signals depending on the choice of the dictionary that was used in the extraction procedure. Essentially, a more redundant dictionary tends to give better adaptivity to the signal, which implies better sparsity of the decomposition. However, if the dictionary is not a basis, the decomposition is not unique.

When the dictionary is orthogonal, the method works perfectly. If the dictionary is not orthogonal, the situation is less clear. To improve the performance of dictionary adaptation methods, many other algorithms can be used. In the paper, we propose the Split Augmented Lagrangian Shrinkage Algorithm-based approach to improving the dictionary determined with the use of Variational Mode Decomposition.

2.3 Basis Pursuit Denoising

Basis pursuit (BP) finds signal representation in an overcomplete dictionary applying optimization methods. Overcomplete dictionary means that that there are in general many representations of the signal. Optimization methods in basis pursuit formulation are used to minimize the 1 norm of coefficients occurring in the current representation. However, because of the non-differentiability of the 1 norm, this optimization principle leads to sparse decompositions.

BP problem can be adapted to the case of the noisy (measured) signal. In this case we assume the following discrete-time model of the signal

$$ y(n) = s(n) + \xi(n), \quad n = 1,2,\ldots,N $$
(8)

where s(n) is the noise-free (clean) signal and ξ(n) denotes the additive white Gaussian noise. In this setting, s(n) is unknown, while y(n) is known.

Without loss of generality we can assume that the components of (8) are represented as vectors in \(\mathbb {R}^{N}\), i.e. \(\mathbf {y} \in \mathbb {R}^{N}\), \(\mathbf {s} \in \mathbb {R}^{N}\), and \(\boldsymbol {\xi } \in \mathbb {R}^{N}\).

Let us assume that the unknown vector s is represented as

$$ \mathbf{s} = \mathbf{B}\mathbf{x} $$
(9)

where B is the matrix representation of a convolution operator, while x represents the vector of unknown coefficients [26].

In the case of noisy signal, BP problem can be reformulated to Basis Pursuit Denoising (BPD). The classic formulation of BPD to determine unknown x has the form

$$ \mathbf{x}^{\ast} = \arg \max_{\mathbf{s}}\left\|{\mathbf{B}\mathbf{x} - \mathbf{y}}\right\|_{2} + \lambda_{BPD}\left\|{\mathbf{x}}\right\|_{1} $$
(10)

where y is the measured signal and λBPD stands for the regularization parameter. The parameter λBPD allows to get either a very sparse (high-value) or a dense (low-value) solution.

The classic solution of the problem (10) has the form [27]

$$ {\textbf{x}}^{\ast} = \text{soft}(\textbf{y},\lambda_{BPD}) = \left\{ \begin{array}{ll} \text{sgn}\left( \textbf{y}\right)\left( \left|\textbf{y}\right| - \lambda_{BPD}\right) & \quad \left|\textbf{y}\right| \!>\! \lambda_{BPD}\\ 0 & \quad \left|\textbf{y}\right| \!\leq\! \lambda_{BPD}. \end{array} \right. $$
(11)

The function soft(y,λBPD) is the proximal operator called the soft thresholding operator.

An alternative solution of the problem (10) is an iterative optimization algorithm called SALSA (Split Augmented Lagrangian Shrinkage Algorithm). SALSA is a variable splitting method that transforms the unconstrained problem into a constrained variable problem. Then, the new problem is solved with the use of augmented Lagrangian method [28]. Variable splitting is a procedure that introduces a new variable, e.g. r. Based on the auxiliary variable r the unconstrained optimization problem (10) can be converted into constrained problem

$$ \begin{aligned} \{\mathbf{x}^{\ast}, \mathbf{r}^{\ast}\} = \arg&\max_{\mathbf{x}, \mathbf{r}} \left\|{\mathbf{B}\mathbf{x} - \mathbf{y}}\right\|_{2} + \lambda_{BPD}\left\|{\mathbf{r}}\right\|_{1}\\ & \text{s.t.} \quad \mathbf{r} - \mathbf{x} = \mathbf{0}. \end{aligned} $$
(12)

The rationale behind applying variable splitting is that it may be easier to solve the constrained problem (12) than it is to solve its unconstrained equivalent (12).

To solve BPD problem (10), which is transformed into (12), SALSA algorithm can be applied. In the case of overcomplete Parseval transform the algorithm has the following form [26], [29]

$$ \begin{array}{@{}rcl@{}} &&\mathbf{v} \leftarrow \text{soft}(\mathbf{x} + \mathbf{d}, \boldsymbol{\lambda}_{BPD}/\mu) - \mathbf{d} \end{array} $$
(13a)
$$ \begin{array}{@{}rcl@{}} &&\mathbf{d} \leftarrow \frac{1}{\mu} \mathbf{B}^{H}\left( \mathbf{y} - \mathbf{B}\mathbf{v}\right) \end{array} $$
(13b)
$$ \begin{array}{@{}rcl@{}} &&\mathbf{x} \leftarrow \mathbf{d} + \mathbf{v} \end{array} $$
(13c)

where BH denotes the complex conjugate (Hermitian) transpose of B, μ is a positive parameter that balances the penalization.

It is worth emphasizing that SALSA is a universal approach, as it allows a general regularizer, not just the 1 norm. Moreover, the SALSA algorithm is fast however the algorithm is fast as a whole if a fast implementation for B and BH is available [29].

3 VMD-based signal denoising with thresholding operator

Recently, in the paper [16] the authors proposed denoising algorithm combining Variational Mode Decomposition with hard and soft thresholding operators.

The hard thresholding operator for VMD algorithm (originally proposed by Donoho [30]) has the form

$$ \mathbf{u}_{k}^{\ast}=\text{hard}(\hat{\mathbf{u}}_{k},\lambda_{hard,k})= \left\{ \begin{array}{ll} \hat{\mathbf{u}}_{k} & \left|\hat{\mathbf{u}}_{k}\right| > \lambda_{hard,k}\\ 0 & \left|\hat{\mathbf{u}}_{k}\right| \leq \lambda_{hard,k} \end{array} \right. $$
(14)

while soft threshold is defined as

$$ \mathbf{u}_{k}^{\ast}=\text{soft}(\hat{\mathbf{u}}_{k},\lambda_{soft,k})= \left\{ \begin{array}{ll} \text{sgn}\left( \hat{\mathbf{u}}_{k}\right)\left( \left|\hat{\mathbf{u}}_{k}\right| - \lambda_{soft,k}\right) & \left|\hat{\mathbf{u}}_{k}\right| > \lambda_{soft,k}\\ 0 & \left|\hat{\mathbf{u}}_{k}\right| \leq \lambda_{soft,k} \end{array} \right. $$
(15)

where k = 1,2,…,K is the k-th mode, λhard,k and λsoft,k denote the hard and soft threshold for the k-th sub-signal respectively. Modes \(\hat {{u}}_{k}\) and their modifications \({u}_{k}^{\ast }\) are represented as vectors in \(\mathbb {R}^{N}\), i.e. \(\hat {{u}}_{k} \in \mathbb {R}^{N}\), and \({u}_{k}^{\ast } \in \mathbb {R}^{N}\).

In hard thresholding, each value of k-th IMF is compared against the threshold value and lower value is replaced by zero. However, the hard threshold leads to abrupt changes in processed signals and it generates artifacts in the estimated signal.

On the other hand, in the soft thresholding approach each value of the IMFs, which are larger than the threshold value, are modified by subtraction with the threshold value.

The soft threshold operator-based signal denoising tends to smooth the signal.

4 The proposed approach

The signal denoising is one of the fundamental and classical tasks in signal processing. Due to the imperfections present in the measurements, the obtained signals are deteriorated by unwanted components. Thus, the main challenge of signal denoising is to preserve and enhance the desirable features of the collected signals.

In the proposed method, we assume the observation discrete-time model in the form of (8).

4.1 Problem formulation

The problem considered in the paper is to design an algorithm that attenuates the noise ξ in given measurements y and keeps the signal s as intact as possible, i.e., minimize difference \(\boldsymbol {\varepsilon } = \hat {\textbf {s}} - \textbf {s}\) (\(\hat {\textbf {s}}\) stands for estimate of s).

4.2 Sparse optimization-based signal denoising

We now apply the approaches described in the previous section (2) to the formulated problem of signal denoising (4.1).

As we mentioned in Section 3, both hard and soft thresholding operators introduce to the estimated signal some artifacts that deteriorate estimation outcomes there is still a need to improve the performance of VMD-based denoising algorithms. For example, in our previous work, we proposed the method to attenuate the noise which combines VMD with Total Variation Denoising [31].

Inspired by the BPD formulation we propose a SALSA-based algorithm to modify modes extracted from measurements y(n) with the use of Variational Mode Decomposition. The general idea is as following: modes extracted initially by applying the VMD method are iteratively improved by the SALSA algorithm.

When the fixed number of iterations is reached, the denoised signal is estimated based on the improved modes applying (2).

To improve the initial estimation of modes extracted by applying VMD we propose the reformulation of BPD problem in the form

$$ \begin{aligned} \mathbf{u}_{k}^{\ast} = \arg \max_{\mathbf{u}_{k}}\left\|{\mathcal{T}^{-1}(\mathbf{u}_{k}) - \mathbf{y}}\right\|_{2} + \lambda_{BPD}\left\|{\mathbf{u}_{k}}\right\|_{1} \end{aligned} $$
(16)

where k = 1,2,…,K, \(\mathcal {T}^{-1}\) represents recomposition of the signal, i.e., \(\mathbf {y} = \mathcal {T}^{-1}(\mathbf {u}_{k})\) (see eq. 2). We named \(\mathcal {T}^{-1}\) as inverse operator. In turn, variable uk represents the k-th mode acquired from VMD, i.e., \(\mathbf {u}_{k} = \mathcal {T}(\mathbf {y})\). It means that \(\mathcal {T}\) denotes VMD-based signal decomposition and we named it forward operator.

By applying variable splitting we define the constrained optimization problem based on (12)

$$ \begin{aligned} \{\mathbf{u}_{k}^{\ast}, \mathbf{r}_{k}^{\ast}\} = \arg&\max_{\mathbf{u}_{k}, \mathbf{r}_{k}}\left\|{\mathcal{T}^{-1}(\mathbf{u}_{k}) - \mathbf{y}}\right\|_{2} + \lambda_{BPD}\left\|{\mathbf{u}_{k}}\right\|_{1}\\ & \text{s.t.} \quad \mathbf{r}_{k} - \mathbf{u}_{k} = \mathbf{0}. \end{aligned} $$
(17)

In order to solve (17) we propose SALSA based algorithm presented in the form of pseudo code as Algorithm (1).

figure a

Finally, estimated modes \(\mathbf {u}_{k}^{\ast }\) are used to estimate signal from the measurements by applying

$$ \begin{aligned} \hat{\mathbf{y}} = \sum\limits_{k = 1}^{K} \mathbf{u}_{k}^{\ast}. \end{aligned} $$
(18)

The formula (18) is used to recompose original signal.

The flow chart of the proposed methods is presented in Fig. 1. In the first step the processed signal y(t) is decomposed into the set of K modes (denoted as u1, u2,…,uK) with the use of VMD-based algorithm described in Section 2.1. Subsequently, each sub-signals is modified by SALSA-based method (see Algorithm 1). The last stage is reconstruction phase, i.e., modified sub-signals (\(\mathbf {u}_{1}^{\ast }, \mathbf {u}_{2}^{\ast }, \ldots , \mathbf {u}_{K}^{\ast }\)) applied to estimate denoised signal with the use of formula (18).

Fig. 1
figure 1

The flow chart of the SALSA-based algorithm to improve VMD-based sub-signals

4.3 Parameters settings

The value of parameter λBPD,k for each mode can be determined with the use of universal formula [32]:

$$ \lambda_{BPD,k} = \mathbf{E}_{k}\sqrt{2\ln N} $$
(19)

where N is the number of data points, k = 1,2,…,K, while Ek is determined with

$$ \begin{array}{@{}rcl@{}} \mathbf{E}_{1} &=&\frac{\text{median}(|\hat{\mathbf{u}}_{1}|)}{0.6745} \end{array} $$
(20a)
$$ \begin{array}{@{}rcl@{}} \mathbf{E}_{k} &=& \frac{\mathbf{E}_{1}^{2}}{\gamma}\rho^{-k}, \quad k = 2,3,\ldots,K \end{array} $$
(20b)

where values of ρ and γ have to be estimated. In [33] the authors proposed γ = 0.719 and ρ = 2.01.

The second parameter (μ) is positive. Typically, its value is initially low, but as the solution converges, its value is progressively increased [34].

The performance of the proposed algorithm also depends on the predetermined parameters of the VMD method. One of the most important parameters to be fixed is the number of modes K. It is important since an inaccurate number of modes will affect the efficiency of denoising [35]. Predetermining the number of modes is still considered as an open problem for the VMD method [15]. The number of modes in most cases is selected based on experience [15]. However, in [35] the authors present preliminary studies in which selection of mode number is based on Detrended Fluctuation Analysis. In our work parameter, K was determined by trial and error method.

5 Experiments

In order to quantitatively evaluate the denoising performance of the proposed method we performed experiments based on both synthetic and real-world signals.

5.1 Results on synthetic signals

In this study we test the performance of the proposed method based on simulated signals (i.e. noise-corrupted synthetic signals). We consider Doppler (see Fig. 2), bumps (Fig. 3), blocks (Fig. 4), and heavisine (Fig. 5) signals of length 4096 samples. In [30] these functions have been chosen to represent a large variety of signals. In our investigations they are generated by the Wavelab function MakeSignal [36].

Fig. 2
figure 2

The clear Doppler signal

Fig. 3
figure 3

The clear bumps signal

Fig. 4
figure 4

The clear blocks signal

Fig. 5
figure 5

The clear heavisine signal

All generated signals, used in our studies, are corrupted by noise. The signals have different SNR (0, 5, 10, and 15 [dB]). Whereas the noise has various short and long dependencies. To simulate different short and long dependencies, we added to the synthetic signals the fractional Gaussian noise (fGn) with Hurst exponent equals to 0.2, 0.5, and 0.8. For illustration purposes, in Fig. 6, we display the Doppler signal with SNR = 10 [dB]. In this case, the fractional Gaussian noise has the value of Hurst exponent equals 0.8. In turn, Fig. 7 we present the bumps signal with SNR = 10 [dB]. The value of Hurst exponent is also equaled 0.8. The last synthetic signals, used in presented studies, are blocks and heavisine signals (Fig. 8 and 9). The example blocks and heavisine signals also have SNR = 10 [dB] with Hurst exponent equals 0.8.

Fig. 6
figure 6

The noised Doppler signal (SNR = 10 [dB], H = 0.8)

Fig. 7
figure 7

The noised bumps signal (SNR = 10 [dB], H = 0.8)

Fig. 8
figure 8

The noised blocks signal (SNR = 10 [dB], H = 0.8)

Fig. 9
figure 9

The noised heavisine signal (SNR = 10 [dB], H = 0.8)

In this part of our investigations to measure the performance of the presented algorithm we applied the signal-to-noise-ratio

$$ \begin{aligned} SNR = 10 \log_{10}\left( \frac{{\sum}_{n=1}^{N}s(n)}{{\sum}_{n=1}^{N}\left( s(n) - \hat{y}(n)\right)^{2}}\right) \end{aligned} $$
(22)

where N is the number of data points, s(n) is the noise-free (original) signal and \(\hat {y}(n)\) stands for denoised signal.

SNR is a typical measure of the quantitative performance of denoising algorithms. The SNR can be used to measure the quality of the original signal and the signal after noise reduction. The greater SNR is, the better the denoising performance of the method is achieved.

The proposed approach is compared with, among others, two types of VMD-based denoising methods, i.e. hard and soft thresholding. Variational Mode Decomposition-based signal denoising with the use of hard thresholding we named VMD-HT, while the algorithm based on soft thresholding is denoted VMD-ST. The algorithms were described in detail, e.g. in [16].

To perform in-depth analysis, additional reference methods were used. In the presented studies we applied the Non-local means algorithm, Median filter, Savitzky-Golay filter, and two wavelet-based denoising methods. In the further part of the works Non-local means algorithm, we denoted as NLM. In our studies, we used its implementation taken from [37]. NLM algorithm’s parameters, i.e., the bandwidth, the patch half-width, and the neighbourhood half-width are determined empirically [10].

The implementation of the Savitzky-Golay filter (SG filter for short), is taken from Matlab’s function sgolayfilt. While Median filter is Matlab’s function medfilt1.

In turn, wavelet-based approaches based on hard and soft thresholding. An approach based on hard thresholding is denoted as DWT-HT, while an algorithm based on soft thresholding is represented by DWT-ST. In our studies, wavelet-based approaches (DWT-HT and DWT-ST) utilized Matlab’s function wden.

To verify the effectiveness of the proposed method, we carried out 30 trials for noisy synthetic signals (i.e., Doppler, bumps, blocks, and heavisine).

5.1.1 Experimental results

The results of conducted experiments on synthetic data are presented in Tables 1234567891011, and 12. We tested Doppler, bumps, blocks, and heavisine signals with SNR varying from 0 to 15 [dB]. The noise was additive and was generated with the Hurst exponent equals 0.2, 0.5, and 0.8.

As it can be seen, the proposed method is always better than other methods, including VMD-related methods (VMD-HT and VMD-ST), only for bumps signal. The best results are obtained for all pre-defined SNR levels as well as for varying values of Hurst exponents. In this case, the proposed method predominates over other methods.

Table 1 The denoising results for the Doppler signal, fractional Gaussian noise H = 0.2
Table 2 The denoising results for the Doppler signal, fractional Gaussian noise H = 0.5
Table 3 The denoising results for the Doppler signal, fractional Gaussian noise H = 0.8
Table 4 The denoising results for the bumps signal, fractional Gaussian noise H = 0.2
Table 5 The denoising results for the bumps signal, fractional Gaussian noise H = 0.5
Table 6 The denoising results for the bumps signal, fractional Gaussian noise H = 0.8
Table 7 The denoising results for the blocks signal, fractional Gaussian noise H = 0.2
Table 8 The denoising results for the blocks signal, fractional Gaussian noise H = 0.5
Table 9 The denoising results for the blocks signal, fractional Gaussian noise H = 0.8
Table 10 The denoising results for the heavisine signal, fractional Gaussian noise H = 0.2
Table 11 The denoising results for the heavisine signal, fractional Gaussian noise H = 0.5
Table 12 The denoising results for the heavisine signal, fractional Gaussian noise H = 0.8

For Doppler and blocks signals the results are not straight forward. For example, analyzing the results of denoising Doppler signal with Hurst exponent equals 0.2 we can see that VMD-related methods give the best results. However, for the proposed method we obtain the best result only for SNR = 0 [dB]. For SNR equals 5, 10 15 [dB] the best outcomes are obtained from VMD-ST and VMD-HT. In the case of Hurst exponent equals 0.5 the best results are acquired from SG Filter (for SNR equals 0 and 5 [dB]). The other successful algorithm for this case are VMD-HT (for SNR = 10 [dB]) and DWT-HT when SNR = 15 [dB]. When Hurst exponent is equal 0.8 the proposed algorithm gives the best result only for SNR = 0 [dB]. The other methods performing better than others are VMD-HT (for SNR equals 5 [dB]) and NLM for SNR equals 10 and 15 [dB].

The results obtained for heavisine are better than for Doppler and blocks, and slightly worse than for bumps signal. In this case, for H = 0.5 and different levels of signal degradation the proposed algorithm is always better than other methods. In turn, when H = 0.2 and SNR is equal to 15 [dB] the best method is Non-local mean. In all other cases, the proposed approach has an advantage over reference methods. In the case of H = 0.8 and SNR equals 10 and 15 [dB] the best results we obtained for VMD-ST and Median filter respectively. For the low-quality heavisine synthetic signals (with SNR equals 0 and 5 [dB]) the proposed algorithm once more gives the best outcomes.

In the case of blocks signal, we also observe that for different values of Hurst exponent and SNR various methods give the best results. VMD related algorithms (VMD-HT and VMD-ST) are successful when Hurst exponent is equal 0.2, 0.5 and 0.8 when SNR is lower (i.e., 0 or 5 [dB]). For higher values of SNR better results are acquired from NLM, Median filter, and SG filter (only when Hurst exponent H = 0.2 and SNR = 0 [dB]). The proposed method for blocks signal in all cases gives weaker results than other methods used in our studies.

For validation of the proposed algorithm to signal denoising, we used four test signals which have the following properties. Doppler signal serves as an example of a high-frequency signal. In turn, bumps express signals with local high variation. Blocks signal represents piece-wise signals. Finally, heavisine imitates a combination of sinusoidal functions.

Based on our investigation tabularized in Tables 112 we can see that the proposed algorithm is suited for signals having properties of bumps signals. It must be emphasized that the proposed algorithm has the best performance for different levels of signal degradation.

Our proposal can be also used with the signals having attributes similar to the heavisine signal. However, the proposed algorithm is recommended only for low-quality signals (with low values of SNR).

5.2 Results on real-world signals

5.2.1 Signals details

The gyroscope is a popular device uses to measure the angular velocity of a moving object. It is a kind of sensor widely used in, e.g., human motion tracking and detection due to its small size, low cost, long lifespan, and no moving parts. Gyroscope sensor has some disadvantages related to its vulnerability to inferences such as temperature, vibration, and pressure [32]. These phenomenons result in different noise effects that degrading the accuracy of the data and limiting its applications.

In general, the gyroscopic signal includes quantization error (QE), angle random walk (ARW), bias instability (BI), rate random walk (RRW), rate ramp (RR) [4]. Quantization error is related to the digital nature of today’s measurements. Since analog signals of the real world have to be discretized during the measurement process. It leads to the deviation of the quantized output signal from the analog input signal. In turn, angle random walk, in gyroscopic measurements, is connected with angle or attitude measurements. The origins of angle random walk noise can be traced back to the spontaneous emission of photons [4]. The source of bias instability is the electronics that are susceptible to random flickering. It is low-frequency noise referred to as bias fluctuations in the measurements. Rate random walk is a random process of uncertain origin. One of the explanations is a limiting case of an exponentially correlated noise with a very long correlation time [38]. While rate ramp is a very low-frequency drift in the sensor’s measurements. It should be treated as deterministic error for long, but finite, time intervals, much like the way constant bias is treated as a deterministic error [4].

The problem of denoising the gyroscopic signal has been considered in many papers. For example, in work [39] the wavelet transform was applied to suppress noises in gyroscopic data. In turn, Neural Networks and Kalman filter were used, for example, in [40] and [41] respectively. While, in [42], [32], [43] Empirical Mode Decomposition was used to remove unwanted components of the gyroscopic measurements.

To verify the feasibility and effectiveness of the proposed signal denoising algorithms, practical experimental data were collected from the three-axis gyroscope. The experiments were performed in static conditions. It means that during the tests the gyroscope is kept stationary. The measurements were collected by the MPU-6050 [44] unit under the gradually change temperature from 3 to 41 C with the sample frequency equal to 40 [Hz] (Fig. 10). The angular rate was recorded for x-, y-, and z-axis. The dataset was taken from [45].

Fig. 10
figure 10

The original signals from gyroscope

The results of the gyroscopic signal denoising are analyzed quantitatively with the use of the Allan Variance (AV). The Allan Variance is a method of representing the root means square (RMS) random-drift error as a function of correlation time. It is a time-domain-analysis technique originally developed to study the frequency stability of precision oscillators. Because of the close analogies to inertial sensors, this method has been adapted to the random-drift characterization of a variety of devices [46].

AV characterizes the noise in sensor signals (e.g. from gyroscope) by quantifying the variance observed in measurements across various correlation times. Hence it helps to analyze the various noise types based on their typical correlation times. For example, the quantization error is located in high-frequency bandwidth and thus has the shortest correlation time among all noise types previously mentioned. In turn, the rate ramp is low-frequency noise and thus has a large correlation time.

The Allan Variance can be described by a log-log curve called the Allan Variance curve (AVC). The properties of various noises such as QE, ARW, BI, RRW, RR is reflected in the curve.

In our study, we utilize the described above property of the Allan Variance to analyze the effectiveness of denoising algorithms. To this end we determine noise parameters (for QE, ARW, BI, RRW, RR) and, based on these parameters, the Allan Variance curve [4] for original and enhancement measurements. When the value of one of the calculated noise parameters decreased, after applying the denoising algorithm, it means that the respective noise was reduced [47], [48]. A similar analysis can be conducted for AVC, however, in this case, we have to analyze the respective part of the curve. When AVC declines to some degree for the denoised signal it means that the respective noise decrease [49], [38].

5.2.2 Experimental results

A quantitative comparison of the results obtained by the Allan Variance is pictured in Figs. 1112 and 13 and tabulated in Tables 1314 and 15. From the Allan Variance plots of the gyroscopic signal, we can observe three different scales − 1/2, 0, and + 1/2. It indicates that the measurements contain the angle random walk, the bias instability, the rate random walk, as well as, quantization and rate ramp noise [38]. In our analysis, we take into account all of these noise terms and their influence upon the precision of gyroscope. All components of the gyroscopic signals are determined based on the code accompanying the paper [4].

Fig. 11
figure 11

The Allan Variance plot of original and denoised gyroscopic signal for x-axis

Fig. 12
figure 12

The Allan Variance plot of original and denoised gyroscopic signal for y-axis

Fig. 13
figure 13

The Allan Variance plot of original and denoised gyroscopic signal for z-axis

Table 13 The Allan Variance results of x-axis gyroscope denoising specified by five noise terms
Table 14 The Allan Variance results of y-axis gyroscope denoising specified by five noise terms
Table 15 The Allan Variance results of z-axis gyroscope denoising specified by five noise terms

The Allan Variance plots show that all of applied methods are able to remove the inertial sensor related noise. The results of comprehensive experiments are presented in Figs. 1113 (AV plots) and tabulated in Tables 1315 (numerical value of QE, ARW, BI, RRW, RR).

Based on the presented results we can see that the proposed method in all cases (for x-, y-, z-axis of a gyroscope) gives better results. Comparing obtained outcomes with other methods we can see that our method outperforms other algorithms an order of magnitude.

5.3 Time complexity

The performance of the denoising algorithm is one of its aspects. The other is the time complexity which is important in practical application.

In this section, we analyzed the estimated time complexity of the proposed algorithm. The obtained results are compared with the results of a similar analysis performed for the reference methods.

The most demanding part of the proposed method is the process of signal decomposition. Analyzing the proposed algorithm (1), we can expect higher time complexity. It is related to the nature of the SALSA method. SALSA algorithm is fast when we have the fast implementation of decomposition and recomposition algorithm. In our case, it is the VMD method. Recomposition is fast however decomposition algorithm is complex and may affect the final complexity of the SALSA based algorithm.

We have performed a series of experiments to verify the time complexity of the specified signal denoising methods. The test signal was the bumps signal with the length ranging from 256 to 4096, where H = 0.5 and SNR = 10 [dB].

The experiments were performed with the following computer: Intel®; CoreTM i7-7700 HQ @2.80 [GHz] and 32.00 GB RAM memory running Windows 10. The execution times is shown in Table 16.

Table 16 Relationship between the sample number and the excution time (The bumps signal, fractional Gaussian Noise H = 0.5, SNR = 10 [dB])

The lowest time complexity (i.e. from 0,0006 ± 0,0014 seconds for the signal of length 256 to 0,0008 ± 0,0016 seconds for the signal of length 4096) we obtain form Median Filter. However, as we compare the results for the synthetic signals (see Tables 1 – 12) the performance of the algorithm is average. In turn, when we compare the performance of the Median Filter on denoising the raw gyroscopic data we can see that the results are weak.

The low time complexity we can also observe for SG Filter, DWT-HT, and DWT-ST. Moreover, the time of computation increases slowly when the number of signal samples increases. NLM has always low time complexity, however, it is higher than the previously analyzed method. Moreover, the time of computation increases importantly faster comparing to the Median Filter, SG Filter, DWT-HT, and DWT-ST.

VMD-related methods demand more computations than other methods investigated in our research. On the one hand, the time of computation is higher in comparison to other reference methods. Furthermore, the times increase faster when the number of samples increases.

In this list, the proposed method has a higher complexity time in comparison to other methods, including VMD-HT and VMD-ST. It is not surprising. The VMD decomposition algorithm has high time complexity. Moreover, the SALSA algorithm is an iterative method. In each iteration, we have to decompose the current signal. Thus the final time complexity depends on VMD time complexity and the number of iteration in the SALSA method.

The results clearly show that it is possible to design a very efficient denoising algorithm by combining the VMD algorithm with the SALSA approach, however at the cost of time complexity.

The obtained outcomes also indicate that one of the future directions of our works should be focused on the reduction of the time complexity of the proposed algorithm. It could be accomplished by applying faster implementation of the Variational Mode Decomposition algorithm.

6 Conclusions

In the paper, we proposed the new algorithm applying Split Augmented Lagrangian Shrinkage Algorithm to Variational Mode Decomposition-based signal denoising. The method is an alternative to existing approaches combining the VMD algorithm with hard and soft thresholding. Our investigations show that the results for the raw gyroscopic signal are one order of magnitude better than the results obtained from methods based on VMD combined with hard and soft thresholding. The proposed algorithm predominates over reference methods in removing unwanted components from raw gyroscopic signals.

The results for synthetic signals (Doppler, blocks, bumps, and heavisine) are not explicit. For the bumps signals, the proposed algorithm gives the best results for different values of signal-to-noise ratios and noise with different short and long dependencies. While for the Doppler and blocks synthetic signals the other methods give better results.

In turn, the outcomes for heavisine are not straightforward. On the one hand, when H = 0.5 the proposed method predominate over each reference method used in our studies. On the other hand, when H is equal to 0.2 and 0.8 and SNR is high the proposed approach gives worse results. While, when SNR is low, i.e. equals 0 or 5 [dB] our algorithm predominate over reference methods.

The weak point of the presented in the paper algorithm is its time complexity. Since the algorithm is based on the SALSA method the time of computation strongly depends on the number of iterations. The crucial element of the algorithm is the VMD method. It is possible to reduce the time of computations when we applied the fast implementation of the Variational Mode Decomposition method. Thus, the fast implementation of VMD can be one of the directions for further works.

The time complexity of the algorithm is a weak point of the presented algorithm. Since Split Augmented Lagrangian Shrinkage Algorithm is an iterative optimization method the time of computation strongly depends on the number of iterations. Thus, the possible area of application the proposed in the work denoising algorithm relating to the problem in which time complexity is not a crucial factor, i.e., off-line processing of measurement signals.