1 Introduction

In conventional digital signal processing, the sampling frequency needs to be at least the double of the maximal signal frequency in order to be able to get exact reconstruction of signals. Alternatively, in compressive sensing (CS) framework, the signal can be exactly recovered even when the number of available samples is considerably below the conventional requirements [1,2,3,4,5]. Although certain conditions should be fulfilled for efficient CS scenario, it turns out that many signals in real applications are very conducive to CS. The first requirement is related to signal sparsity, which is generally achieved using certain transform domain representation [4,5,6]. The second one refers to incoherence between the measurement process and the sparsity basis. In other words, a small set of linear measurements needs to be acquired in a random manner [4]. Under these conditions, the signal reconstruction is observed as a problem of solving undetermined system of linear equations. It is important to emphasize that the CS has been efficiently combined with different time-frequency approaches, either to define improved CS methods based on the time-frequency dictionaries or to define improved time-frequency representations based on the CS reconstruction principles [7]. A reliable instantaneous frequency estimation from a small set of available samples can be achieved by applying the signal reconstruction algorithms on the samples of the local autocorrelation function [8]. Therefore, the CS-based reconstruction methods bring several benefits to time-frequency analysis.

Solving the problem of signal reconstruction from a small set of available samples can be a complex and demanding task from both software and hardware perspectives. The existing software solutions [9,10,11,12,13,14] are mainly based on different iterative convex optimization techniques or their greedy iterative counterparts. These software implementations are generally unsuitable for real-time applications, due to the fact that the algorithms require a large number of iterations to achieve convergence. Moreover, the intrinsic parallelism in the algorithm execution is difficult to achieve in a mono-processor computing system [15]. Therefore, in order to cope efficiently with signal reconstruction problem in compressive sensing, one needs a dedicated parallel hardware implementation of reconstruction algorithm. The hardware realization can be done using analog, digital, or combined analog and digital devices.

In this paper, we propose an analog hardware architecture of the gradient-based signal reconstruction algorithm [14, 16]. The chosen algorithm exhibits optimal performance for different types of signals. It belongs to a group of convex optimization methods prominent for its accuracy and is computationally less demanding than the other methods from this category. The proposed algorithm outperforms the standard ones used for the ℓ1-norm-based minimization problems in both the calculation time and accuracy [16]. In huge contrast to greedy methods, the gradient-based algorithm does not require the signal to be strictly sparse in a certain transform domain, rendering it suitable for real-world applications. Furthermore, it is well known the computational burden of digital hardware implementation can be intensive, leading to long execution time [17,18,19]. Hence, opting for analog implementation instead might bring considerable advantage in terms of complexity and consequently, to reduced execution time.

The paper is organized as follows. Theoretical background on the compressive sensing and gradient-based signal reconstruction algorithm is given in Section 2. Section 2 presents the proposed hardware solution, whose performance is discussed/assessed in Section 3. The concluding remarks are given in Section 4.

2 Compressive sensing—theory and reconstruction method

Compressive sensing framework emerged from the efforts to decrease the amount of data sensed in the modern applications [5], and consequently to decrease the needs for resources such as storage capacity, number of sensors, and consumption. The set of measured data in traditional approach is usually referred to as full dataset, while the data in CS scenarios are called incomplete dataset. The data samples in the traditional sampling approach are equidistant and uniformly sampled, while in the CS, the data should be sampled randomly to achieve a high incoherence of the dataset. If the measurement process is modeled by a certain measurement matrix Φ, then the measured dataset y of length M can be described as [2]:

$$ \mathbf{y}=\boldsymbol{\Phi} x $$
(1)

where x represents the full dataset of length N, and the measurement matrix is of size M × N, with M < < N. In practical scenarios, the CS dataset y and the measurement matrix Φ are known, and we desire to reconstruct the full dataset x of length N. However, the system of equations in (1) is underdetermined since M < < N. Thus, in order to tackle the problem, one resorts to the assumption that the signal x is sparse when represented in certain transform basis:

$$ \mathrm{X}=\mathbf{W}\mathrm{x} $$
(2)

where X is a transform domain representation of full signal dataset and W is the transform matrix, i.e.,

$$ \mathbf{W}=\left[\begin{array}{cccc}{W}_1(1)& {W}_2(1)& ...& {W}_N(1)\\ {}{W}_1(2)& {W}_2(2)& ...& {W}_N(2)\\ {}...& ...& ...& ...\\ {}{W}_1(N)& {W}_2(N)& ...& {W}_N(N)\end{array}\right] $$
(3)

where Wi are the basis functions. Depending on the signal, the basic functions could belong to discrete Fourier transform (DFT), discrete Hermite transform (DHT), discrete wavelet transform (DWT), discrete cosine transform (DCT), etc. For the sake of simplicity, we can write W as the coefficients’ matrix:

$$ \mathbf{W}=\left[\begin{array}{cccc}{w}_{11}& {w}_{21}& ...& {w}_{N1}\\ {}{w}_{12}& {w}_{22}& ...& {w}_{N2}\\ {}...& ...& ...& ...\\ {}{w}_{1N}& {w}_{2N}& ...& {w}_{NN}\end{array}\right] $$
(4)

Sparsity means that the vector X has only K out of N nonzero (or significant) elements, while K < < N and K < M. The system of linear equations becomes:

$$ y=\boldsymbol{\Phi} {W}^{-1}X $$
(5)

The reconstructed signal X can be obtained as a sparsest solution of the following optimization problem [7]:

$$ \operatorname{minimize}{\left\Vert X\right\Vert}_1\kern0.72em \mathrm{subject}\kern0.34em \mathrm{to}\kern0.48em y=\boldsymbol{\Phi} {W}^{-1}X $$
(6)

where ||X||1 denotes l1-norm. Generally, the l0-norm is used to measure the signal sparsity, but in practical implementations, using the l0-norm can lead to an NP problem; hence, the l1-norm is used instead.

There are few types of algorithms that can be applied for solving this problem and the most commonly used are the convex optimization algorithms and the greedy algorithms [9, 10]. In general, the greedy solutions are simpler, but less reliable. Convex optimization algorithms are more reliable and accurate in general and guaranteed convergence, but their solutions comes at the cost of computational complexity and large number of nested iterations [4, 9]. In this paper, we consider the gradient-based convex optimization algorithm as a representative of the convex optimization group allowing simpler implementation compared with other common solutions.

2.1 Gradient-based signal reconstruction algorithm

Assume that the missing samples positions are defined by the set Nm = {n1, n2, ...nN − M}. Now the measurement vector is extended to the length of full dataset such that we embed zeros on the positions of missing samples, i.e., y(n) = 0, for n ∈ Nm. The gradient-based approach starts from some initial values of unavailable samples (initial state) which are changed through the iterations in a way to constantly improve the concentration in sparsity domain. In general, it does not require the signal to be strictly sparse in certain transform domain, which is an advantage over other methods. In particular, the missing samples in the signal domain can be considered as zero values. In each iteration, the missing samples are changed for +∆ and for −∆. Then, for both changes, the concentration is measured as l1-norm of the transform domain vectors X+ (for +∆ change) and X (for −∆ change), while the gradient is determined using their difference. Finally, the gradient is used to update the values of missing samples. Here, it is important to note that each sample is observed separately and a single iteration is terminated when all samples are processed.

In the sequel, we provide a modified version of the algorithm adapted for an efficient hardware implementation avoiding complex functions and calculations.

figure a

The algorithm steps are combined within the blocks according to the parallelization principle. The steps within the same block can be implemented in parallel. The steps from the succeeding block are performed after the current block is processed. When the algorithm comes close to the minimum of the sparsity measure, the gradient changes direction for almost 180 degrees, meaning that the step ∆ needs to be decreased (line 18). The precision of the result in this iterative algorithm is estimated based on the change of the result in the last iteration.

3 Analog hardware implementation of modified gradient-based algorithm

From the modified version of the algorithm, we can identify following steps that need to be implemented within analog hardware architecture:

  1. a)

    Set the signal samples at the input of analog architecture (line 0)

  2. b)

    Set the digital signal identifying the positions of available and missing samples at the input of the architecture: positions of available samples are marked by value “1” (high voltage) and missing samples are marked by “0” (low voltage) values (line 0 in the modified algorithm).

  3. c)

    Set the value for the gradient step Δ as in line 2 in the modified algorithm

  4. d)

    Update the values of the gradient g for input samples (lines 4 to 11 in the modified algorithm)

  5. e)

    Update the values of the missing samples and β (lines 12 and 13 in the modified algorithm)

  6. f)

    Check the condition for changing Δ (line 14 of the modified algorithm):

  • If the condition is not satisfied, repeat the steps d) and e) (lines 4 to 13 in the modified algorithm)

  • If the condition is satisfied, Δ is decreased, for example, to Δ/3 (line 18 in the modified algorithm). It is noteworthy that in order to achieve high precision, the step Δ should be decreased when approaching the stationary oscillations zone.

  1. g)

    In parallel with changing the step Δ, check the stopping criterion. The stopping criterion Tr is defined as the mean value of changes applied on the missing samples using the previous value of gradient step Δ (line 19: in the modified algorithm). When the applied changes are less than the predefined minimum, the updated value of Δ has no significant influence on the signal quality and thus, the procedure should be stopped. In such a case, the current values of input signal represent the reconstruction result. If the stopping criterion is not met, the procedure is repeated with new updated value Δ.

The block scheme for the proposed analog architecture is shown in Fig. 1.

Fig. 1
figure 1

The block scheme of the proposed analog architecture

The architecture is composed of the following segments:

  • Input array holding the analog samples y,

  • Input array e holding the indicators of available and missing samples positions,

  • Set Δ correction circuit,

  • Four architecture blocks: Block 1, Block 2, Block 3, and Block 4.

The proposed architecture blocks (Block 1, Block 2, Block 3, and Block 4) have been driven by the output signals from the oscillator, namely OSC_1, OSC_2, OSC_3, and OSC_4, respectively. The sequence of active states of the oscillator’s signal controls the activation order of the four architecture blocks. Before the START signal, all outputs are set to low voltage level. After activating START signal, the oscillator’s output signals are shown in Fig. 2.

Fig. 2
figure 2

Oscillator output signals (OSC_1, OSC_ 2, OSC_3, and OSC_4); T1, T2, T3, and T4 are durations of active states of OSC_1, OSC_2, OSC_3, and OSC_4 signals, respectively

Prior to activating the START signal, the values of input signals y are loaded to the sample and hold amplifiers, and e values are set. During the START impulse, the initial value of Δ is set to the output of set Δ correction circuit, whose electrical diagram is shown in Fig. 3. Let us briefly discuss the set Δ correction circuit. By activating START signal, the capacitor C is loaded to the approximately maximal absolute value of the input signal samples. It is the initial value of the Δ correction.

Fig. 3
figure 3

Set Δ correction circuit

Further update of Δ is controlled by the Correc signal: when Correc is high, Δ is changed to Δ/3 (as in line 18 in the modified algorithm).

The architecture Block 1 is used to temporarily store values of y within the array yp (for the current value of Δ), Fig. 4. Block 1 is active when OSC_1 (T1 impulse) is high and either START or Correc signal is active. At the beginning, during active state of START signal and first T1 impulse, the initial value of array y is loaded into the array yp. Further update of yp is controlled by active state of OSC_1 signal and active state of the Correc signal. It is important to note that yp values are updated each time when a new correction value Δ is set to Δ/3. Therefore, yp represents the version of reconstructed signal at the moment of applying a new Δ value. It is used later to calculate Tr (line 19 in the modified algorithm) as the mean square error between y signal before and y signal after applying current Δ correction value.

Fig. 4
figure 4

Block 1: sample and hold circuits controlled by OSC_1, START, and Correc signals used to update yp array

The architecture Block 2 is active when OSC_2 (T2 impulse) is high and it is used to update the values of the gradients g (for each missing sample position), as shown in Fig. 5. Current values of y are placed in y′ (for the ith iteration y′ = y(i) in the algorithm). The values of signal samples y are loaded into the sample and hold amplifiers y′. Further, the architecture Block 2 allows parallel computation of all gradient samples in g. A circuit used to calculate a single gradient sample is shown in Fig. 6. It consists of amplifiers representing the transformation coefficients wij belonging to the transformation matrix W (defined in (5)), adders and absolute value circuits. The calculation of a gradient sample gi requires N(N + 1) + 1 amplifiers, 2(N + 1) adders with N inputs, 2 N absolute value circuits, 1 adder with two inputs, 1 invertor circuit, and 2 switches. Note that the amplifiers drawn by the dashed line (right side of Fig. 6) are not supposed to be implemented in hardware. Namely, these amplifiers are shown in Fig. 6 only to provide a clearer circuit illustration, but the necessary values are already available at the output of the appropriate amplifiers on the left side of Fig. 6. The switches in Fig. 6 are controlled by the signal e containing the indicators of available and missing samples. The switch SW1 needs to be closed on the positions of missing samples and opened on the positions of available samples (the gradients are calculated only for the missing samples positions), while the situation with SW2 is opposite. Hence, at the position of available sample, the signal e(i) keeps the switch SW1 open, while the signal \( \overline{e(i)} \) closes the SW2 switch. At the position of missing sample, the signal e(i) keeps the switch SW1 closed, while the signal \( \overline{e(i)} \) opens the switch SW2.

Fig. 5
figure 5

Block 2 calculates and updates the vector of gradient samples (gradients)

Fig. 6
figure 6

Circuit for the calculation of gradient sample at position j (gj)

The architecture Block 3 is active when OSC_3 signal (T3 impulse) is high and it is used to obtain updated signal values y using gradient g and current signal values y′ from Block 2 (y(i+1) = y(i)g(i) in the algorithm). This part of Block 3 is shown in Fig. 7. Another part of the architecture Block 3 is responsible for updating value of β (Fig. 8). The value of β is fed to the input of the circuit that validates the condition for changing the step Δ.

Fig. 7
figure 7

Circuits for updating the input signal values y

Fig. 8
figure 8

A circuits for updating parameter β

The circuit in Fig. 8 consists of analog voltage adders, multipliers, squaring and rooting circuits, division (multiplication) circuits, and sample and hold amplifiers. It can be seen that the calculation of β requires 3 adders with N inputs, N + 1 multiplier, 2 N squaring circuits, two square-rooting circuit, and division circuit. Note that the array denoted by g′ holds the gradient values from the previous iteration. Update of values in g′ array is done in Block 4.

The architecture Block 4 is active when OSC_4 signal (T4 impulse) is high and it is used to load current values from g into g′. The circuits in Block 4 are further used to check the parameter β and activate Correc signal (Fig. 9a), update the value of Tr (Fig. 9b), and check whether the stopping criterion is met (Fig. 9c). The circuit in Fig. 9b consists of the adders (N + 2), multipliers, N inverters, 2 N squaring analog voltage circuits, and 1 division circuit. The circuit in Fig. 9 a compares β and βREF, and if β < βREF, the signal Correc is activated as a trigger to update the value of Δ. The active state of signal Correc allows us to check the stopping criterion (Fig. 9c). If the stopping criterion is met, the STOP signal is activated. This causes deactivation of all oscillator outputs and consequently all switches in the hardware become open. In the case where the stopping criterion is not met, RcCc time constant is used to keep the high voltage of Correc signal for the next T1 period. It is used to enable update of sample and hold amplifiers denoted by yp.

Fig. 9
figure 9

Circuits in block 4: a checking the condition β < βREF (if the condition is satisfied, Correc signal is activated), b updating the threshold value Tr, c checking the stopping criterion Tr < Tref

The required components per block are shown in Table 1 (only the main and most represented components are listed, such as amplifiers, adders, and multipliers).

Table 1 The number of the most represented circuits required

4 Results and discussion

The simulation of the proposed hardware architecture is done using the PSpice (OrCAD 16.6). For the realization of multipliers, division circuit, squaring, and square-rooting circuit the analog device AD734 is used. The circuit represents a precise four-quadrant high-speed analog multiplier that is able to perform all the abovementioned functions. For the realization of amplifiers (inverters), adders and absolute value circuits, we employed the ultra-high-speed operational amplifier LT1191. For the analog voltage comparison, we employed the circuit LT1394 (comparator) [20] (with settling time 7 ns). The circuit provides the complementary outputs and allows the zero-crossing detection. The sample and hold circuits AD783 are used for storing the voltage values. Furthermore, in the simulation, we used the analog switches MAX4645 [21] and diodes 1 N4448 [22].

The simulation has shown that the required duration for the oscillator outputs are T1 ≈ 320 ns, T2 ≈ 740 ns, T3 ≈ 650 ns, and T4 ≈ 610 ns. The time between the impulses (Tp) is required to turn off the analog switches used in the proposed hardware. The turn off time for switches MAX4645 is up to Tp = 40 ns. Hence, the minimal duration for one algorithm iteration is T1 + T2 + T3 + T4 + 4Tp = 2.48 μs ≈ 2.5 μs. The total calculation time for the reconstruction of an input signal depends on the number of iterations. The proposed solution allows up to 400 iterations within 1 ms, and it is interesting to mention that most of the signals in real applications can be completely accurately reconstructed within a few hundreds of iterations [16].

Another advantage of the proposed hardware solution is the robustness on hardware imperfection. The used components generate a very small error. The main source of errors is the circuit AD734. However, it contributes in error with typical 0.1% of the full scale or 0.25 in full temperature range from – 40 to + 85 °C. Our simulations have shown that the maximal introduced error can cause only a few additional iterations to achieve a high precision reconstruction performance, which is negligible in terms of time and processing costs.

4.1 Example

In order to provide an illustration of signal reconstruction results, let us observe the experimental synthetic signal that is sparse in the Hermite transform domain. The original signal contains 64 samples, but only 26 samples are available on the random positions within the signal, while 38 samples are missing and should be reconstructed. The available samples are shown in Fig. 10a. The original signal with missing samples marked in red color is shown in Fig. 10b. The reconstructed signal is shown in Fig. 10c. The sparse Hermite transform after the reconstruction is shown in Fig. 10d. The achieved mean square error is of order 10−3 which can be considered negligible in the case of observed signal. Thus, the reconstruction result is highly accurate.

Fig. 10
figure 10

a 26 (out of 64) available samples of synthetic signal, b original (full length) signal with marked available samples, c reconstructed (full length) signal, d Hermite transform of the reconstructed signal

In real-world applications, the same concept can be applied for the reconstruction of QRS complexes (as shown in Fig. 11) in ECG signals, or ultra-wide band (UWB) signals in communications, that are also sparse in the Hermite transform basis. In the case of other communications signals (e.g., FHDSS), the Fourier transform basis would be more appropriate for achieving sparse signal representation. Finally, without loss of generality, this approach can be applied in the same way by using any other transform basis as long as it allows sparse representation of the specific observed signal. Therefore, the proposed hardware implementation is very amenable to various practical applications, for instance in communications, radars and remote sensing, biomedicine, etc.

Fig. 11
figure 11

a 20 available samples (out of 50) of QRS signal, b original QRS signal (full-length signal consisting of 50 samples) with available samples marked in red, c reconstructed (full length) QRS signal, d Hermite transform of the reconstructed signal

5 Conclusion

In this work, we presented an analog hardware architecture for gradient-based algorithm for sparse signal reconstruction. The algorithm is suitable for different types of signals and therefore is suitable for real-time implementation which opens a wide range of applications, including time-frequency signal analysis and instantaneous frequency estimation. The proposed analog design allows fast processing and low-complexity realization, executing significant number of algorithm iterations in short-time intervals (at the order of 1 ms), which is highly satisfactory for real-time applications.