Iterative hard thresholding for compressed data separation

doi:10.1016/j.jco.2020.101469

Journal of Complexity

Volume 59, August 2020, 101469

https://doi.org/10.1016/j.jco.2020.101469 Get rights and content

Abstract

We study the problem of reconstructing signals’ distinct subcomponents, which are approximately sparse in morphologically different dictionaries, from a small number of linear measurements. We propose an iterative hard thresholding algorithm adapted to dictionaries. We show that under the usual assumptions that the measurement system satisfies a restricted isometry property (adapted to a composed dictionary) condition and the dictionaries satisfy a mutual coherence condition, the algorithm can approximately reconstruct the distinct subcomponents after a fixed number of iterations.

Introduction

The basic perceptiveness of compressed sensing is that sparse signals can be reconstructed via efficient algorithms from a small number of linear measurements.

In standard compressed sensing, one observes $(A, y)$ with $y = A f + e,$ where $y \in R^{m}$ , $A \in R^{m \times n}$ with $m ≪ n$ , $f \in R^{n}$ is a (approximately) $s$ -sparse signal of interest, and $e \in R^{m}$ is a vector of measurement errors. The goal is to reconstruct the sparse signal $f$ based on the measurement matrix $A$ and the measurement vector $y$ via an efficient algorithm. The constrained $l_{1}$ -minimization has been shown to be very effective for these problems. See, e.g., Candès and Tao [9], [11], Donoho [15] and Donoho, Elad, and Temlyakov [16].

Though the constrained $l_{1}$ -minimization can be solved efficiently using iterative algorithms from convex optimization, it may suffer substantial computation expense in large-scale applications [18], [33]. Thus it is necessary to use alternative iterative methods that are not based on optimization, such as orthogonal matching pursuit (OMP) [14], [30], stagewise OMP [19], regularized OMP [29], compressive sampling matching pursuit (CoSaMP) [28], iterative hard thresholding (IHT) [4], subspace pursuit [13] and many other variants [5], [21]. Recovery results based on a restricted isometry property (RIP) [9] condition $δ_{c s} \leq θ$ have been well developed for these algorithms, see [4], [5], [7], [13], [21], [28], [34], [35]. The RIP condition $δ_{c s} \leq θ$ is satisfied with high probability for several random matrix ensembles, such as the subgaussian ensembles and random partial Fourier transforms [3], [10], [31], provided one chooses $m \geq O (s {log}^{β} n ∕ θ^{2})$ . Consequently, with high probability, the aforementioned algorithms can approximately recover every $s$ -sparse vector with small or zero errors from $O (s {log}^{β} (n))$ random measurements.

For signals which are sparse in some orthonormal basis the above techniques hold. However, in practical examples, there are numerous signals of interest which are not sparse in an orthonormal basis. Often, sparsity is expressed not in terms of an orthogonal basis but in terms of an overcomplete (and possibly coherent) dictionary, which means that our signal $f$ is now expressed as $f = D x$ where $D \in R^{n \times d} (d \geq n)$ is a redundant dictionary and $x$ is (approximately) sparse, see e.g. [8], [12] and references therein.

The $l_{1}$ -analysis approach is one of the effective approach for solving these problems. Analogous recovery results based on $D$ -RIP (RIP adapted to the dictionary $D$ ) as those in the classical setting have been developed for these approaches [1], [8], [23], [27], [32] . Specially, when $D$ is a tight frame, Candès et al. [8] showed that if the measurement matrix satisfies a $D$ -RIP condition $δ_{2 s} \leq 0.08$ , the solution $\hat{f}$ of the analysis Basis Pursuit (ABP) $\underset{\tilde{f} \in R^{d}}{arg min} {‖ D^{*} \tilde{f} ‖}_{1} s.t. {‖ A \tilde{f} - y ‖}_{2} \leq ϵ$ has an error bound ${‖ \hat{f} - f ‖}_{2} \leq c_{0} \frac{σ_{s} (D^{*} f)}{\sqrt{s}} + c_{1} ϵ,$ provided that ${‖ e ‖}_{2} \leq ϵ$ . Here, $σ_{s} (x)$ denotes the best $s$ -term approximation error in $l_{1}$ -norm: $σ_{s} (x) = min_{{‖ u ‖}_{0} \leq s} {‖ u - x ‖}_{1} .$ Under the assumption that the measurement matrix $A$ satisfies the $D$ -RIP condition $δ_{3 s} \leq 1 ∕ 2$ , Lin and Li [23] showed that the solution $\hat{f}$ of the analysis Dantzig selector (ADS) $\underset{\tilde{f} \in R^{d}}{arg min} {‖ D^{*} \tilde{f} ‖}_{1} s.t. {‖ D^{*} A^{*} (A \tilde{f} - y) ‖}_{\infty} \leq λ$ can recover the signal with an error bound ${‖ \hat{f} - f ‖}_{2} \leq c_{0} \frac{σ_{s} (D^{*} f)}{\sqrt{s}} + c_{1} \sqrt{s} λ,$ provided that ${‖ D^{*} A^{*} e ‖}_{\infty} \leq λ$ . Recall that $D$ -RIP [8] is defined as follows.

Definition 1

The measurement matrix $A$ satisfies restricted isometry property adapted to $D \in R^{n \times d}$ (abbreviated as $D$ -RIP) of order $s$ with constant $δ \in (0, 1)$ , if for all $s$ -sparse vectors $z \in R^{d}$ , $(1 - δ) {‖ D z ‖}_{2}^{2} \leq {‖ A D z ‖}_{2}^{2} \leq (1 + δ) {‖ D z ‖}_{2}^{2} .$ The restricted isometry constant adapted to $D$ (abbreviated as $D$ -RIC) of order $s$ is the smallest number such that (1.1) holds for all $s$ -sparse vectors $z \in R^{d}$ , and it is denoted as $δ_{s}$ .

As noted in [8], the $D$ -RIP condition $δ_{c s} \leq θ$ is satisfied with high probability by $m \times n$ matrices populated by i.i.d subgaussian entries with variance $m^{- 1}$ provided $m \geq c (δ) s ln (e d ∕ s)$ . It is also fulfilled with high probability for random partial Fourier matrices after sign randomization of their columns [22].

In signal processing and statistics, it is common to assume that the noise vector is a Gaussian noise, $e \sim N (0, σ^{2} I_{m})$ . The Gaussian noise is essentially bounded (e.g. [11], [23]). Thus, all derived results mentioned above for the ABP and the ADS can be applied directly to the Gaussian noise. In this case, the ABP and the ADS provide very similar guarantees, but there are certain circumstances that the ADS is preferable since the ADS yields a bound that is adaptive to the unknown level of sparsity of the object signal and thus providing a stronger guarantee when $s$ is small [23] (this is also noted by Candès and Tao [11] for the classical compressed sensing).

Most recently, Foucart [20] studied IHT adapted to a dictionary and showed that under a $D$ -RIP condition, IHT provides the same theoretical guarantees as ABP considering an $l_{2}$ -bounded noise. In this paper, as a byproduct of our analysis, we will show that IHT has the same theoretical guarantees as ADS, yielding preferable error bounds (that are adaptive to the unknown level of sparsity of the object signal) than those from [20] for the Gaussian noise.

The main goal of this paper is to study the problem of compressed data separation, i.e., reconstruction of signal’s different sparse components from compressed measurements. In the latter case, the signal is composed of two different components, i.e. $f = f_{1} + f_{2}$ . More specifically, we observe the data $y \in R^{m}$ following the linear measurement model $y = A (f_{1} + f_{2}) + e .$ The goal is to reconstruct the unknown constituents $f_{1}$ and $f_{2}$ based on the measurement vector $y$ and the measurement matrix $A$ . Refer to [25] and references therein for further details on compressed data separation. As in [17], [25], we assume that the different components $f_{1}$ and $f_{2}$ are sparse or approximately sparse in terms of two different tight frames $D_{1} \in R^{n \times d_{1}}$ and $D_{2} \in R^{n \times d_{2}}$ , respectively, i.e., $f_{1} = D_{1} x_{1}$ and $f_{2} = D_{2} x_{2}$ for some (approximately) sparse vectors $x_{1} \in R^{d_{1}}$ and $x_{2} \in R^{d_{2}}$ .¹ The morphological difference between components $f_{1}$ and $f_{2}$ is measured in terms of the mutual coherence, defined as follows.

Definition 2

The mutual coherence between two dictionaries $D_{1}$ and $D_{2}$ is defined as $μ = max_{i, j} | 〈 {[D_{1}]}_{i}, {[D_{2}]}_{j} 〉 |,$ where ${[D_{1}]}_{i}$ and ${[D_{2}]}_{j}$ are the $i$ th column of $D_{1}$ and $j$ th column of $D_{2}$ .

The $l_{1}$ -split analysis approach was proposed in [2], [8], which finds $f_{1}$ and $f_{2}$ via the $l_{1}$ -minimization $\underset{{\tilde{f}}_{1}, {\tilde{f}}_{2}}{arg min} {‖ D_{1}^{*} {\tilde{f}}_{1} ‖}_{1} + {‖ D_{2}^{*} {\tilde{f}}_{2} ‖}_{1} s.t. {‖ A ({\tilde{f}}_{1} + {\tilde{f}}_{2}) - y ‖}_{2} \leq ϵ .$ The $l_{1}$ -split analysis [24], [25] can approximately reconstruct the different two components $f_{1}$ and $f_{2}$ with an error bound ${‖ {\hat{f}}_{1} - f_{1} ‖}_{2} + {‖ {\hat{f}}_{2} - f_{2} ‖}_{2} \leq C_{0} (\frac{σ_{s_{1}} (D_{1}^{*} f_{1}) + σ_{s_{2}} (D_{2}^{*} f_{2})}{\sqrt{s_{1} + s_{2}}}) + C_{1} ϵ,$ provided that the measurement matrix satisfies a $Φ$ -RIP condition $δ_{c s} \leq C$ where $s = s_{1} + s_{2}$ and $Φ = [D_{1} | D_{2}]$ , and the dictionaries $D_{1}$ and $D_{2}$ satisfy a mutual coherence condition $μ s \leq c$ .

In this paper, we propose an IHT type algorithm for compressed data separation. We show that under essentially the same conditions as $l_{1}$ -split analysis, the algorithm after a finite number of iterations has the following error bound: ${‖ f_{1}^{t} - f_{1} ‖}_{2} + {‖ f_{2}^{t} - f_{2} ‖}_{2} \leq C_{0} (\frac{σ_{s_{1}} (D_{1}^{*} f_{1}) + σ_{s_{2}} (D_{2}^{*} f_{2})}{\sqrt{s_{1} + s_{2}}}) + C_{1} Δ,$ where $Δ = \{\begin{matrix} 2 \sqrt{s_{1} + s_{2}} λ, & if {‖ Φ^{*} A^{*} e ‖}_{\infty} \leq λ, \\ \sqrt{2 (1 + δ_{4 s})} ϵ, & if {‖ e ‖}_{2} \leq ϵ . \end{matrix}$ As a corollary, our results show that if $e$ is a Gaussian noise, $e \sim N (0, σ^{2} I_{m})$ , then with high probability ${‖ f_{1}^{t} - f_{1} ‖}_{2} + {‖ f_{2}^{t} - f_{2} ‖}_{2} \leq C_{0} (\frac{σ_{s_{1}} (D_{1}^{*} f_{1}) + σ_{s_{2}} (D_{2}^{*} f_{2})}{\sqrt{s_{1} + s_{2}}}) + C_{1} σ \sqrt{(s_{1} + s_{2}) log (d_{1} + d_{2})} .$ The derived error bound is adaptive to the unknown level of sparsity of the object components and thus IHT provides a stronger guarantee (when $s_{1} + s_{2}$ is small) than those from [25] for the $l_{1}$ -split analysis.

Rewriting the measurement model (1.2) as $y = A (D_{1} x_{1} + D_{2} x_{2}) + e,$ one can naturally use standard compressed sensing techniques to first obtain an estimate $[{\hat{x}}_{1}; {\hat{x}}_{2}]$ of the sparse coefficient vectors via an $l_{1}$ -minimization or standard IHT and then reconstruct the signal’s components by a synthesis operation ${\hat{f}}_{j} = D_{j} {\hat{x}}_{j}$ . Recovery guarantees for such approaches may be achieved under a condition on the coherence of $[A D_{1} | A D_{2}]$ , or under a condition on the coherence of $A D_{j}$ and the mutual coherence between $D_{1}$ and $D_{2}$ . However, as noted in [8] it is very hard for $A D_{j}$ to satisfy the coherence condition when $D_{j}$ is highly correlated ( $j = 1, 2$ ). These are different from our setting here, where we impose no incoherence property on the dictionaries $D_{j}$ themselves. We shall restrict this work to the setting of real valued signals. Notice that as in [20], our results can be extended to the complex valued signals case.

For a vector $v \in R^{d}$ , ${‖ v ‖}_{0}$ is the number of nonzero entries of $v$ . For any $q \in (0, \infty)$ , denote ${‖ v ‖}_{q} = {(\sum_{j = 1}^{d} {| v_{j} |}^{q})}^{1 ∕ q}$ and ${‖ v ‖}_{\infty} = {max}_{j} | v_{j} |$ . For $d \in N$ , we write $[d]$ to mean ${1, 2, \dots, d}$ . Given an index set $T \subset [d]$ and a matrix $D \in R^{n \times d}$ , $T^{c}$ is the complement of $T$ in $[d]$ , $D_{T}$ (or ${[D]}_{T}$ ) is the submatrix of $D$ formed from the columns of $D$ indexed by $T$ .² Write $D^{*}$ to mean the conjugate transpose of a matrix $D$ , $D_{T}^{*}$ to mean ${(D_{T})}^{*}$ , and $‖ D ‖$ to mean the spectral norm of $D$ . For a vector $x \in R^{d}$ , $x_{[s]}$ denotes the vector consisting of the $s$ largest entries of $x$ in magnitude. $C > 0$ (or $c$ , $c_{1}$ ) denotes a universal constant that might be different in each occurrence.

Section snippets

Main results

To reconstruct $f_{1}$ and $f_{2}$ based on $(y, A)$ with model (1.2), we propose the following IHT algorithm adapted to dictionaries $D_{1}$ and $D_{2}$ .

Algorithm 1

Let $η > 0$ , $s \in N$ , and $f_{1}^{0} = f_{2}^{0} = 0$ . For $t = 0, \dots, T - 1 :$ $\{\begin{matrix} g_{1}^{t} = A^{*} (A (f_{1}^{t} + f_{2}^{t}) - y), & (a) \\ {\bar{f}}_{1}^{t + 1} = f_{1}^{t} - η g_{1}^{t}, {\bar{f}}_{2}^{t + 1} = f_{2}^{t} - η g_{1}^{t}, & (b) \\ (z_{1}^{t + 1}; z_{2}^{t + 1}) = \underset{{‖ z_{1} ‖}_{0} + {‖ z_{2} ‖}_{0} \leq 2 s}{arg min} {‖ z_{1} - D_{1}^{*} {\bar{f}}_{1}^{t + 1} ‖}_{2}^{2} + {‖ z_{2} - D_{2}^{*} {\bar{f}}_{2}^{t + 1} ‖}_{2}^{2}, & (c) \\ f_{1}^{t + 1} = D_{1} z_{1}^{t + 1}, f_{2}^{t + 1} = D_{2} z_{2}^{t + 1} . & (d) \end{matrix}$

The above algorithm can be viewed as a projected gradient descent algorithm for solving the following constrained least-squares problem: $\underset{{‖ D_{1}^{*} h_{1} ‖}_{0} + {‖ D_{2}^{*} h_{2} ‖}_{0} \leq}{arg min}$

Proofs

We begin with some notations. We let $d = d_{1} + d_{2}$ , $s = s_{1} + s_{2}$ , and $E = [I_{n} | I_{n}],$ $Φ = [D_{1} | D_{2}], Ψ = [\begin{matrix} D_{1} & 0 \\ 0 & D_{2} \end{matrix}], h_{⋆} = (\begin{matrix} f_{1} \\ f_{2} \end{matrix}) .$ Then (1.2) can be rewritten as $y = A E h_{⋆} + e .$ Under the assumptions that $f_{1}$ and $f_{2}$ are approximately sparse in terms of $D_{1}$ and $D_{2}$ respectively,³ it is easy to show that $h_{⋆}$ is approximately sparse in terms of $Ψ$ . Our aim is to reconstruct $h_{⋆}$ . In Algorithm 1, we also

Acknowledgments

This work is partially supported by the NSF of China under grant numbers 11901518, 11531013, 11525104, 11971427, the NSAF of China under grant number U1630116, and the Fundamental Research Funds for the Central Universities under grant number 2019QN81010. The authors would like to thank the referees for their valuable comments and Miss Huiping Li for proofreading the manuscript.

References (35)

AldroubiA. et al.
Perturbations of measurement matrices and dictionaries in compressed sensing
Appl. Comput. Harmon. Anal.
(2012)
BlumensathT. et al.
Iterative hard thresholding for compressed sensing
Appl. Comput. Harmon. Anal.
(2009)
BouchotJ.L. et al.
Hard thresholding pursuit algorithms: number of iterations
Appl. Comput. Harmon. Anal.
(2016)
CandèsE.J. et al.
Compressed sensing with coherent and redundant dictionaries
Appl. Comput. Harmon. Anal.
(2011)
LinJ. et al.
Sparse recovery with coherent tight frames via analysis dantzig selector and analysis lasso
Appl. Comput. Harmon. Anal.
(2014)
NamS. et al.
The cosparse analysis model and algorithms
Appl. Comput. Harmon. Anal.
(2013)
NeedellD. et al.
Cosamp: Iterative signal recovery from incomplete and inaccurate samples
Appl. Comput. Harmon. Anal.
(2009)
ShenY. et al.
Stable recovery of analysis based approaches
Appl. Comput. Harmon. Anal.
(2015)
AubelC. et al.
Sparse signal separation in redundant dictionaries
BaraniukR. et al.
A simple proof of the restricted isometry property for random matrices
Constr. Approx.
(2008)

CaiT.T. et al.

On recovery of sparse signals via $l_{1}$ minimization

IEEE Trans. Inform. Theory

(2009)

CaiT.T. et al.

Sparse representation of a polytope and recovery of sparse signals and low-rank matrices

IEEE Trans. Inform. Theory

(2013)

CandèsE.J. et al.

Decoding by linear programming

IEEE Trans. Inform. Theory

(2005)

CandèsE.J. et al.

Near-optimal signal recovery from random projections: Universal encoding strategies?

IEEE Trans. Inform. Theory

(2006)

CandèsE.J. et al.

The dantzig selector: Statistical estimation when p is much larger than n

Ann. Statist.

(2007)

ChenS.S. et al.

Atomic decomposition by basis pursuit

SIAM Rev.

(2001)

DaiW. et al.

Subspace pursuit for compressive sensing signal reconstruction

IEEE Trans. Inform. Theory

(2009)

Cited by (7)

Sufficient condition based on nearly optimal order RIC for IHT algorithm
2024, Applicable Analysis
STABLE RECOVERY OF SPARSELY CORRUPTED SIGNALS THROUGH JUSTICE PURSUIT DE-NOISING<sup>*</sup>
2024, Journal of Computational Mathematics
Compressed data separation via unconstrained l<inf>1</inf>-split analysis
2023, Analysis and Applications
Signal recovery adapted to a dictionary from non-convex compressed sensing
2023, International Journal of Computing Science and Mathematics
A full read-write separation method of heterogeneous database based on HBase
2022, International Journal of Reasoning-based Intelligent Systems
Compressed sensing with continuous parametric reconstruction
2021, International Journal of Electrical and Computer Engineering

View all citing articles on Scopus

^☆: Communicated by D.-X. Zhou.

View full text

Iterative hard thresholding for compressed data separation☆

Abstract

Introduction

Section snippets

Main results

Proofs

Acknowledgments

Appl. Comput. Harmon. Anal.

Appl. Comput. Harmon. Anal.

Appl. Comput. Harmon. Anal.

Appl. Comput. Harmon. Anal.

Appl. Comput. Harmon. Anal.

Appl. Comput. Harmon. Anal.

Appl. Comput. Harmon. Anal.

Appl. Comput. Harmon. Anal.

Sparse signal separation in redundant dictionaries

A simple proof of the restricted isometry property for random matrices

Constr. Approx.

On recovery of sparse signals via l1 minimization

IEEE Trans. Inform. Theory

Sparse representation of a polytope and recovery of sparse signals and low-rank matrices

IEEE Trans. Inform. Theory

Decoding by linear programming

IEEE Trans. Inform. Theory

Near-optimal signal recovery from random projections: Universal encoding strategies?

IEEE Trans. Inform. Theory

The dantzig selector: Statistical estimation when p is much larger than n

Ann. Statist.

Atomic decomposition by basis pursuit

SIAM Rev.

Subspace pursuit for compressive sensing signal reconstruction

IEEE Trans. Inform. Theory

On recovery of sparse signals via $l_{1}$ minimization