Elsevier

Journal of Complexity

Volume 59, August 2020, 101469
Journal of Complexity

Iterative hard thresholding for compressed data separation

https://doi.org/10.1016/j.jco.2020.101469Get rights and content

Abstract

We study the problem of reconstructing signals’ distinct subcomponents, which are approximately sparse in morphologically different dictionaries, from a small number of linear measurements. We propose an iterative hard thresholding algorithm adapted to dictionaries. We show that under the usual assumptions that the measurement system satisfies a restricted isometry property (adapted to a composed dictionary) condition and the dictionaries satisfy a mutual coherence condition, the algorithm can approximately reconstruct the distinct subcomponents after a fixed number of iterations.

Introduction

The basic perceptiveness of compressed sensing is that sparse signals can be reconstructed via efficient algorithms from a small number of linear measurements.

In standard compressed sensing, one observes (A,y) with y=Af+e,where yRm, ARm×n with mn, fRn is a (approximately) s-sparse signal of interest, and eRm is a vector of measurement errors. The goal is to reconstruct the sparse signal f based on the measurement matrix A and the measurement vector y via an efficient algorithm. The constrained l1-minimization has been shown to be very effective for these problems. See, e.g., Candès and Tao [9], [11], Donoho [15] and Donoho, Elad, and Temlyakov [16].

Though the constrained l1-minimization can be solved efficiently using iterative algorithms from convex optimization, it may suffer substantial computation expense in large-scale applications [18], [33]. Thus it is necessary to use alternative iterative methods that are not based on optimization, such as orthogonal matching pursuit (OMP) [14], [30], stagewise OMP [19], regularized OMP [29], compressive sampling matching pursuit (CoSaMP) [28], iterative hard thresholding (IHT) [4], subspace pursuit [13] and many other variants [5], [21]. Recovery results based on a restricted isometry property (RIP) [9] condition δcsθ have been well developed for these algorithms, see [4], [5], [7], [13], [21], [28], [34], [35]. The RIP condition δcsθ is satisfied with high probability for several random matrix ensembles, such as the subgaussian ensembles and random partial Fourier transforms [3], [10], [31], provided one chooses mO(slogβnθ2). Consequently, with high probability, the aforementioned algorithms can approximately recover every s-sparse vector with small or zero errors from O(slogβ(n)) random measurements.

For signals which are sparse in some orthonormal basis the above techniques hold. However, in practical examples, there are numerous signals of interest which are not sparse in an orthonormal basis. Often, sparsity is expressed not in terms of an orthogonal basis but in terms of an overcomplete (and possibly coherent) dictionary, which means that our signal f is now expressed as f=Dx where DRn×d(dn) is a redundant dictionary and x is (approximately) sparse, see e.g. [8], [12] and references therein.

The l1-analysis approach is one of the effective approach for solving these problems. Analogous recovery results based on D-RIP (RIP adapted to the dictionary D) as those in the classical setting have been developed for these approaches [1], [8], [23], [27], [32] . Specially, when D is a tight frame, Candès et al. [8] showed that if the measurement matrix satisfies a D-RIP condition δ2s0.08, the solution fˆ of the analysis Basis Pursuit (ABP) arg minf̃RdDf̃1s.t.Af̃y2ϵhas an error bound fˆf2c0σs(Df)s+c1ϵ,provided that e2ϵ. Here, σs(x) denotes the best s-term approximation error in l1-norm: σs(x)=minu0sux1.Under the assumption that the measurement matrix A satisfies the D-RIP condition δ3s12, Lin and Li [23] showed that the solution fˆ of the analysis Dantzig selector (ADS) arg minf̃RdDf̃1s.t.DA(Af̃y)λcan recover the signal with an error bound fˆf2c0σs(Df)s+c1sλ,provided that DAeλ. Recall that D-RIP [8] is defined as follows.

Definition 1

The measurement matrix A satisfies restricted isometry property adapted to DRn×d (abbreviated as D-RIP) of order s with constant δ(0,1), if for all s-sparse vectors zRd, (1δ)Dz22ADz22(1+δ)Dz22.The restricted isometry constant adapted to D (abbreviated as D-RIC) of order s is the smallest number such that (1.1) holds for all s-sparse vectors zRd, and it is denoted as δs.

As noted in [8], the D-RIP condition δcsθ is satisfied with high probability by m×n matrices populated by i.i.d subgaussian entries with variance m1 provided mc(δ)sln(eds). It is also fulfilled with high probability for random partial Fourier matrices after sign randomization of their columns [22].

In signal processing and statistics, it is common to assume that the noise vector is a Gaussian noise, eN(0,σ2Im). The Gaussian noise is essentially bounded (e.g. [11], [23]). Thus, all derived results mentioned above for the ABP and the ADS can be applied directly to the Gaussian noise. In this case, the ABP and the ADS provide very similar guarantees, but there are certain circumstances that the ADS is preferable since the ADS yields a bound that is adaptive to the unknown level of sparsity of the object signal and thus providing a stronger guarantee when s is small [23] (this is also noted by Candès and Tao [11] for the classical compressed sensing).

Most recently, Foucart [20] studied IHT adapted to a dictionary and showed that under a D-RIP condition, IHT provides the same theoretical guarantees as ABP considering an l2-bounded noise. In this paper, as a byproduct of our analysis, we will show that IHT has the same theoretical guarantees as ADS, yielding preferable error bounds (that are adaptive to the unknown level of sparsity of the object signal) than those from [20] for the Gaussian noise.

The main goal of this paper is to study the problem of compressed data separation, i.e., reconstruction of signal’s different sparse components from compressed measurements. In the latter case, the signal is composed of two different components, i.e. f=f1+f2. More specifically, we observe the data yRm following the linear measurement model y=A(f1+f2)+e.The goal is to reconstruct the unknown constituents f1 and f2 based on the measurement vector y and the measurement matrix A. Refer to [25] and references therein for further details on compressed data separation. As in [17], [25], we assume that the different components f1 and f2 are sparse or approximately sparse in terms of two different tight frames D1Rn×d1 and D2Rn×d2, respectively, i.e., f1=D1x1 and f2=D2x2 for some (approximately) sparse vectors x1Rd1 and x2Rd2.1 The morphological difference between components f1 and f2 is measured in terms of the mutual coherence, defined as follows.

Definition 2

The mutual coherence between two dictionaries D1 and D2 is defined as μ=maxi,j|[D1]i,[D2]j|,where [D1]i and [D2]j are the ith column of D1 and jth column of D2.

The l1-split analysis approach was proposed in [2], [8], which finds f1 and f2 via the l1-minimization arg minf̃1,f̃2D1f̃11+D2f̃21s.t.A(f̃1+f̃2)y2ϵ.The l1-split analysis [24], [25] can approximately reconstruct the different two components f1 and f2 with an error bound fˆ1f12+fˆ2f22C0σs1(D1f1)+σs2(D2f2)s1+s2+C1ϵ,provided that the measurement matrix satisfies a Φ-RIP condition δcsC where s=s1+s2 and Φ=[D1|D2], and the dictionaries D1 and D2 satisfy a mutual coherence condition μsc.

In this paper, we propose an IHT type algorithm for compressed data separation. We show that under essentially the same conditions as l1-split analysis, the algorithm after a finite number of iterations has the following error bound: f1tf12+f2tf22C0σs1(D1f1)+σs2(D2f2)s1+s2+C1Δ,where Δ=2s1+s2λ,if ΦAeλ,2(1+δ4s)ϵ,if e2ϵ.As a corollary, our results show that if e is a Gaussian noise, eN(0,σ2Im), then with high probability f1tf12+f2tf22C0σs1(D1f1)+σs2(D2f2)s1+s2+C1σ(s1+s2)log(d1+d2).The derived error bound is adaptive to the unknown level of sparsity of the object components and thus IHT provides a stronger guarantee (when s1+s2 is small) than those from [25] for the l1-split analysis.

Rewriting the measurement model (1.2) as y=A(D1x1+D2x2)+e,one can naturally use standard compressed sensing techniques to first obtain an estimate [xˆ1;xˆ2] of the sparse coefficient vectors via an l1-minimization or standard IHT and then reconstruct the signal’s components by a synthesis operation fˆj=Djxˆj. Recovery guarantees for such approaches may be achieved under a condition on the coherence of [AD1|AD2], or under a condition on the coherence of ADj and the mutual coherence between D1 and D2. However, as noted in [8] it is very hard for ADj to satisfy the coherence condition when Dj is highly correlated (j=1,2). These are different from our setting here, where we impose no incoherence property on the dictionaries Dj themselves. We shall restrict this work to the setting of real valued signals. Notice that as in [20], our results can be extended to the complex valued signals case.

For a vector vRd, v0 is the number of nonzero entries of v. For any q(0,), denote vq=j=1d|vj|q1q and v=maxj|vj|. For dN, we write [d] to mean {1,2,,d}. Given an index set T[d] and a matrix DRn×d, Tc is the complement of T in [d], DT (or [D]T) is the submatrix of D formed from the columns of D indexed by T.2 Write D to mean the conjugate transpose of a matrix D, DT to mean (DT), and D to mean the spectral norm of D. For a vector xRd, x[s] denotes the vector consisting of the s largest entries of x in magnitude. C>0 (or c, c1) denotes a universal constant that might be different in each occurrence.

Section snippets

Main results

To reconstruct f1 and f2 based on (y,A) with model (1.2), we propose the following IHT algorithm adapted to dictionaries D1 and D2.

Algorithm 1

Let η>0, sN, and f10=f20=0. For t=0,,T1: g1t=A(A(f1t+f2t)y),(a)f̄1t+1=f1tηg1t,f̄2t+1=f2tηg1t,(b)(z1t+1;z2t+1)=arg minz10+z202sz1D1f̄1t+122+z2D2f̄2t+122,(c)f1t+1=D1z1t+1,f2t+1=D2z2t+1.(d)

The above algorithm can be viewed as a projected gradient descent algorithm for solving the following constrained least-squares problem: arg minD1h10+D2h20

Proofs

We begin with some notations. We let d=d1+d2, s=s1+s2, and E=[In|In], Φ=[D1|D2],Ψ=D100D2,h=f1f2.Then (1.2) can be rewritten as y=AEh+e.Under the assumptions that f1 and f2 are approximately sparse in terms of D1 and D2 respectively,3 it is easy to show that h is approximately sparse in terms of Ψ. Our aim is to reconstruct h. In Algorithm 1, we also

Acknowledgments

This work is partially supported by the NSF of China under grant numbers 11901518, 11531013, 11525104, 11971427, the NSAF of China under grant number U1630116, and the Fundamental Research Funds for the Central Universities under grant number 2019QN81010. The authors would like to thank the referees for their valuable comments and Miss Huiping Li for proofreading the manuscript.

References (35)

  • CaiT.T. et al.

    On recovery of sparse signals via l1 minimization

    IEEE Trans. Inform. Theory

    (2009)
  • CaiT.T. et al.

    Sparse representation of a polytope and recovery of sparse signals and low-rank matrices

    IEEE Trans. Inform. Theory

    (2013)
  • CandèsE.J. et al.

    Decoding by linear programming

    IEEE Trans. Inform. Theory

    (2005)
  • CandèsE.J. et al.

    Near-optimal signal recovery from random projections: Universal encoding strategies?

    IEEE Trans. Inform. Theory

    (2006)
  • CandèsE.J. et al.

    The dantzig selector: Statistical estimation when p is much larger than n

    Ann. Statist.

    (2007)
  • ChenS.S. et al.

    Atomic decomposition by basis pursuit

    SIAM Rev.

    (2001)
  • DaiW. et al.

    Subspace pursuit for compressive sensing signal reconstruction

    IEEE Trans. Inform. Theory

    (2009)
  • Communicated by D.-X. Zhou.

    View full text