1 Introduction

Phase retrieval refers to the problem of reconstructing an unknown vector \(x_0 \in {\mathbb {C}}^n\) from m measurements of the form

$$\begin{aligned} y_i = \vert \langle \xi ^{\left( i\right) }, x_0 \rangle \vert ^2 + w_i \quad \quad \left( \text {where } i \in \left[ m\right] \right) , \end{aligned}$$
(1.1)

where the \(\xi ^{\left( i\right) } \in {\mathbb {C}}^n\) are known measurement vectors and \(w_i \in {\mathbb {R}}\) represents additive noise. Such problems are ubiquituous in many areas of science and engineering such as X-ray crystallography [23, 32], astronomical imaging [18], ptychography [35], and quantum tomography [28].

The foundational papers [4, 7, 13] proposed to reconstruct \(x_0\) via the PhaseLift method, a convex relaxation of the original problem. These papers have triggered many follow-up works since they were the first to establish rigorous recovery guarantees under the assumption that the measurement vectors \(\xi ^{\left( i\right) }\) are sampled uniformly at random from the sphere. Since then several papers have analyzed scenarios where the measurement vectors possess a significantly reduced amount of randomness, in particular spherical designs [21] and coded diffraction patterns [5, 22]. However, the theoretical results for coded diffraction patterns rely on the assumption that the modulus of the illumination patterns is varying. Indeed, it was shown in [17] that for certain illumination patterns with constant modulus ambiguities can arise, i.e., it is not possible to determine \(x_0\) uniquely from the measurements \(y_i\). In fact, such ambiguities can already arise in much simpler settings, where the measurement vectors \( \left( \xi ^{\left( i\right) } \right) \) are i.i.d. subgaussian. For example, consider the case that \(\xi ^{\left( i\right) } = \left( \varepsilon ^{\left( i\right) }_{1} , \ldots , \varepsilon ^{\left( i\right) }_{n} \right) \), where the \( \varepsilon ^{\left( i\right) }_{j} \) are i.i.d. Rademacher random variables. That is, they only take the values \(+1\) and \(-1\) each with probability \(\frac{1}{2}\). In this case the vector \(x_0:=e_1=\left( 1, 0, \ldots , 0\right) \) can never be distinguished from the vector \( {\tilde{x}}_0:=e_2=\left( 0, 1, \ldots , 0\right) \). Note that in this scenario it holds that \({\mathbb {E}} \left[ \big \vert \xi ^{\left( i\right) }_j \big \vert ^4 \right] = {\mathbb {E}} \left[ \big \vert \xi ^{\left( i\right) }_j \big \vert ^2 \right] \) and, hence, the vector \( \xi ^{\left( i\right) } \) does not fulfill a small-ball probability assumption, which means that there is no constant \(c>0\) such that for all \(\varepsilon >0\) and for all vectors x it holds that

$$\begin{aligned} {\mathbb {P}} \left( \vert \langle x, \xi ^{\left( i\right) } \rangle \vert \le \varepsilon \Vert x \Vert \right) \le c\varepsilon . \end{aligned}$$
(1.2)

When the signals are complex even additional classes of ambiguities can arise. For example, when the measurement vectors \( \xi ^{\left( i\right) } \) are real, any signal x and its complex-conjugate signal \( {\overline{x}} \) will result in identical observations.

For these reasons, previous works on phase-retrieval from subgaussian measurements (see, e.g., [11]) work with real signals and require that all entries of the vector \( \xi ^{\left( i\right) } \) fulfill

$$\begin{aligned} {\mathbb {E}} \left[ \big \vert \xi ^{\left( i\right) }_j \big \vert ^4 \right] > {\mathbb {E}} \left[ \big \vert \xi ^{\left( i\right) }_j \big \vert ^2 \right] \end{aligned}$$
(1.3)

for all \( j \in \left[ n\right] \) or make even stronger assumptions.

The only exception is [26] which shows for the real-valued case (\(x_0 \in {\mathbb {R}}^n\) and \(\xi ^{\left( i\right) } \in {\mathbb {R}}^n \)) PhaseLift recovers a large class of signals from subgaussian measurements even if estimates of the type (1.3) are not satisfied. More precisely, one obtains that all signals \(x_0\) whose peak-to-average power ratio satisfies a mild bound of the form

$$\begin{aligned} \frac{\Vert x_0 \Vert _{\infty }}{\Vert x_0 \Vert } \le \mu < 1 \end{aligned}$$
(1.4)

for some absolute constant \(\mu >0\), can be recovered with high probability as long as \( m \gtrsim n \). However, as the approach in [26] is intrinsically based on arguments in [16] it cannot be generalized to the complex case in a straightforward manner. This paper provides an analysis both for real-valued and complex-valued signals. We believe that this understanding will be of importance for the subsequent study of structured scenarios such as coded diffraction patterns, which are also intrinsically complex in nature.

While the proofs in previous papers [5, 21, 22, 26] relied on the construction of a so-called dual certificate, our paper will employ a more geometric approach based on Mendelson’s small ball method [25, 31]. This is motivated by recent work [24, 27, 28], which showed that a geometric analysis based on the descent cone of the trace norm can often yield additional insights compared to an approach based on dual certificates.

For the problem studied in this paper, however, the small-ball method cannot be applied directly to the entire descent cone or the entire cone of directions in which positive semidefiniteness is preserved. Rather we divide the latter cone into two parts: One that contains all the problematic cases, but is small, and one that is larger, but easier to analyze. Then we control one of these cones using a restricted isometry property and one via the small-ball method.

We think that this novel viewpoint and also some of the techniques developed in this paper will be useful for the analysis of other interesting measurement scenarios, such as the case of heavy-tailed measurement vectors \(\xi ^{\left( i\right) } \) or the case that \(\xi ^{\left( i\right) }\) has only entries 0 and 1.

2 Background and Main Results

2.1 Notation

\({\mathcal {S}}^n\) denotes the vector space of all Hermitian matrices in \({\mathbb {C}}^{n \times n} \). By \({\mathcal {S}}^n_{+} \subset {\mathcal {S}}^n \) we will denote the set of all positive definite Hermitian matrices. For \(A,B \in {\mathcal {S}}^n\) the Hilbert-Schmidt inner product is defined by \( \left\langle A,B \right\rangle _{HS} := \text {Tr}\,\left( A^* B\right) \). The corresponding norm will be denote by \( \Vert \cdot \Vert _{HS}\). For a matrix \(Z \in {\mathcal {S}}^n \) we will denote their eigenvalues by \(\lambda _1 \left( Z\right) , \lambda _2 \left( Z\right) , \ldots , \lambda _n \left( Z\right) \), which are assumed to be arranged in decreasing order, i.e., \( \lambda _1 \left( Z\right) \ge \lambda _2 \left( Z\right) \ge \cdots \ge \lambda _n \left( Z\right) \). If no confusion can arise, we will suppress the dependence on Z and write \(\lambda _i \) instead of \(\lambda _i \left( Z\right) \). By \(\Vert Z \Vert _1\) we will denote the Schatten-1 norm of Z, i.e. \( \Vert Z \Vert _1 := \sum _{i=1}^{n} \vert \lambda _i \left( Z\right) \vert \). By \(\text {diag}\left( Z\right) \in {\mathcal {S}}^n\) we denote the matrix, which we obtain by setting all off-diagonal entries of Z equal to zero. We will write \(a \lesssim b\) or \(b \gtrsim a \) if there is a universal constant \(C>0\) such that \( a \le Cb \).

2.2 PhaseLift

The PhaseLift method was first introduced in [7]. In this paper we focus on a variant [4, 13] based on the observation that the measurements \(y_i\) can be rewritten in the form

$$\begin{aligned} y_i = \text {Tr}\,\left( \xi ^{\left( i\right) } \left( \xi ^{\left( i\right) } \right) ^* X_0 \right) +w_i, \end{aligned}$$
(2.1)

where \(X_0 = x_0 x^*_0 \) is a rank-1 matrix encoding the signal to be recovered up to the true inherent phase ambiguity. From this observation, PhaseLift relaxes the constraint that \(X_0\) is of rank 1 to obtain the optimization problem

$$\begin{aligned} \begin{aligned} \text {minimize } \quad&\sum _{i=1}^{m} \big \vert \text {Tr}\,\left( \xi ^{\left( i\right) } (\xi ^{\left( i\right) })^* X \right) -y_i \big \vert \\ \text { such that} \quad&X \in {\mathcal {S}}^n_{+}. \end{aligned} \end{aligned}$$
(2.2)

In order to simplify notation we introduce the linear operator \({\mathcal {A}}: {\mathcal {S}}^n \rightarrow {\mathbb {R}}^m \) as

$$\begin{aligned} {\mathcal {A}} \left( Z\right) \left( i\right) := \left\langle \xi ^{\left( i\right) } (\xi ^{\left( i\right) })^*, Z \right\rangle _{HS} . \end{aligned}$$
(2.3)

Hence, setting \(y:= \left( y_1, \ldots , y_m \right) \in {\mathbb {R}}^m \), (2.2) can be rewritten as

$$\begin{aligned} \begin{aligned} \text {minimize } \quad&\Vert {\mathcal {A}} \left( X\right) -y \Vert _{\ell _1}\\ \text {such that} \quad&X \in {\mathcal {S}}^n_{+}. \end{aligned} \end{aligned}$$
(2.4)

We note that while understanding the relaxation (2.4) is an important benchmark approach and can be solved in polynomial time, it is typically not practical for applications, as lifting increases the number of optimization variables. For this reason, a very active line of research study recovery guarantees for algorithms that operate in the natural parameter domain such as alternating minimization (see, e.g., [34, 43]), gradient-descent based formulations (see, e.g., [6, 9, 10, 38, 39]), and anchored regression [1,2,3, 20]. However, most of these guarantees have been shown under the assumption that the measurement vectors \(\left\{ \xi ^{\left( i\right) } \right\} _{i=1}^m \) are sampled i.i.d. from the unit sphere, so it will be a natural follow-up of this work to study to which extent our results generalize to the more practical nonconvex algorithms. In particular, most reconstruction guarantees for these non-convex approaches require an appropriate initialization. For this reason, one needs to study which initialization schemes work for the measurements considered in this paper. A natural approach will be to try spectral initializations and recent generalizations that have been shown to be feasible for a basically minimal number of measurements [15, 29, 30, 33]. We expect that the analysis provided in this paper will prove useful for this endeavour as the spectral initialization is somewhat connected to trace-norm minimization.

2.3 Subgaussian Measurements

We consider random measurement vectors \( \left\{ \xi ^{\left( i\right) } \right\} ^m_{i=1} \) given as independent copies of a random vector \(\xi \), whose entries \(\xi _j\) are assumed to be i.i.d. subgaussian random variables with parameter K, expectation \({\mathbb {E}}\left[ \xi _j\right] =0 \), and variance \( {\mathbb {E}}\left[ \vert \xi _j \vert ^2 \right] =1\). Recall that a random variable X is subgaussian with parameter K, if and only if

$$\begin{aligned} \inf \left\{ t>0: \ {\mathbb {E}} \left[ \exp \left( X^2 / t^2 \right) \right] \le 2 \right\} \le K < + \infty . \end{aligned}$$
(2.5)

It is well known (see, e.g., [42]) that from this definition it follows for any (measurable) random variable X that

$$\begin{aligned} \Vert X \Vert _{L_p} \lesssim \sqrt{p} K. \end{aligned}$$
(2.6)

Since \( \Vert \xi _1 \Vert _{L_2}^2 = {\mathbb {E}} \left[ \vert \xi _1 \vert ^2 \right] =1 \) inequality (2.6) immediately implies that

$$\begin{aligned} K \gtrsim 1. \end{aligned}$$
(2.7)

Moreover, it is well known (see, e.g., [42]) that for all \( x\in {\mathbb {C}}^n\) the random variable \( \langle x,\xi \rangle \) is subgaussian with parameter \(K \Vert x \Vert \).

2.4 Previous Work

A number of previous works have studied phase retrieval with subgaussian measurements in the real-valued setting , i.e., \(x_0 \in {\mathbb {R}}^n\) and \(\xi \in {\mathbb {R}}^n\). For measurements fulfilling \( {\mathbb {E}} \left[ \big \vert \xi _j \big \vert ^4 \right] > {\mathbb {E}} \left[ \big \vert \xi _j \big \vert ^2 \right] \), [11] showed that PhaseLift admits order optimal uniform recovery guarantees.Footnote 1 Without the assumption \( {\mathbb {E}} \left[ \big \vert \xi _j \big \vert ^4 \right] > {\mathbb {E}} \left[ \big \vert \xi _j \big \vert ^2 \right] \), in [26] the following result was proven, again for the real-valued case.

Theorem 1

[26, Theorem V.1] Let \(\xi = \left( \xi _1, \ldots , \xi _n \right) \in {\mathbb {R}}^n \) be a random vector with i.i.d. subgaussian entries. Then there exist constants \(C_1\), \(C_2\), \(C_3\), and \(0< \mu < 1 \), which depend only on the distribution of \(\xi _1\), such that whenever

$$\begin{aligned} m \ge C_1 n, \end{aligned}$$
(2.8)

the following statement holds with probability at least \(1-\exp \left( - C_2 n \right) \): For all signals \(x_0 \in {\mathbb {R}}^n\) with \(\Vert x_0 \Vert _{\infty } \le \mu \Vert x_0 \Vert \) and all noise vectors \( w \in {\mathbb {R}}^m \) any minimizer of (2.4) fulfills

$$\begin{aligned} \Vert {\hat{X}} - x_0 x^*_0 \Vert _{HS} \le C_3 \frac{\Vert w \Vert _{\ell _1}}{m}. \end{aligned}$$
(2.9)

3 Main Results

3.1 Complex Signals and Complex Measurement Vectors

In Theorem 1 both the signal \(x_0\) and the measurement vectors \(\xi ^{\left( i\right) }\) are assumed to be real. While for the measurement vectors this is often too restrictive, the signal \(x_0\) is indeed typically real-valued in applications. This important special case will be discussed in Sect. 3.2 below. Nevertheless, we find it still interesting from a mathematical point of view under which assumptions recovery is possible for complex-valued signals. Our first result deals with this case.

As we have explained in Sect. 1, there are subgaussian distributions for which we cannot achieve uniform recovery of all signals \( x_0 \in {\mathbb {C}}^n\). For this reason, we define for all \(0 < \mu \le 1 \) the set of all signals of mildly bounded peak-to-average power ratio

$$\begin{aligned} {\mathcal {X}}_{\mu }:= \left\{ x_0 \in {\mathbb {C}}^n \setminus \left\{ 0\right\} : \Vert x_0 \Vert _{\infty } \le \mu \Vert x_0 \Vert \right\} . \end{aligned}$$
(3.1)

Indeed, this restriction is very mild as \(\mu \) will not depend on the dimension, whereas for a Gaussian random signal the ratio \(\frac{\Vert x \Vert _{\infty }}{\Vert x \Vert } \) would scale like \( \sqrt{\frac{\log n}{n}} \). Now we are prepared to state the following theorem, which is our first main result.

Theorem 2

Let the observation vector y be given as in (1.1), where the random measurement vectors \( \left\{ \xi ^{\left( i\right) } \right\} _{i=1}^m \) are defined as in Sect. 2.3. Assume that \( \vert {\mathbb {E}} \left[ \xi _1^2 \right] \vert ^2 \le 1-\beta \) for some \(\beta \in \left( 0,1\right) \) and that

$$\begin{aligned} m \ge C_1 \frac{K^{20}}{\beta ^{5/2}} n. \end{aligned}$$
(3.2)

Then for some probability parameter \(p_{\beta }=1- {\mathcal {O}} \left( \exp \left( \frac{-m \beta ^4}{C_2 K^{16}} \right) \right) \) the following two statements hold.

  1. 1.

    With probability at least \(p_{\beta }\) one has that for all vectors \(x_0 \in {\mathcal {X}}_{1/81} \) and any noise vector \(w\in {\mathbb {R}}^m\) any solution \({\hat{X}}\) of (2.4) satisfies

    $$\begin{aligned} \Vert {\hat{X}} - x_0 x^*_0 \Vert _{1} \le C_3 \frac{K^8}{m \beta ^{5/2}} \Vert w \Vert _{\ell _1}. \end{aligned}$$
    (3.3)
  2. 2.

    If, in addition, \({\mathbb {E}} \left[ \vert \xi _1 \vert ^4 \right] \ge 1+\beta \), then with probability at least \(p_{\beta }\) inequality (3.3) holds for all \(x_0 \in {\mathbb {C}}^n \setminus \left\{ 0 \right\} \).

Here \(C_1\), \(C_2\), and \(C_3\) are universal constants.

The first case of Theorem 2, where one makes no assumption on the fourth moment of \(\xi _1 \), can be applied also to certain scenarios, where unique recovery is not possible without this assumption. One important example is that the entries \(\xi _i\) are drawn from \( \left\{ z\in {\mathbb {C}}: \ \Vert z \Vert =1 \right\} \) uniformly at random. Note that these measurements will always yield the same observations y for the two signals

$$\begin{aligned} x_1&=\left( 1, 0, \ldots , 0\right) ,\end{aligned}$$
(3.4)
$$\begin{aligned} {\tilde{x}}_1&=\left( 0, 1, \ldots , 0\right) . \end{aligned}$$
(3.5)

Such very sparse signals are exactly prevented by Condition 1, so there is no contradiction to the theorem’s conclusion that unique recovery can be achieved via (2.4) for all signals \(x_0\) such that \(\Vert x_0 \Vert _{\infty } \le \frac{1}{82} \Vert x_0 \Vert \).

Note that in the second scenario, where assumptions on the fourth moment of \(\xi _1\) are available, we obtain a uniform recovery result over all \(x_0 \in {\mathbb {C}}^n \). In the real-valued case a similar result has been shown in [11].

Remark 1

An assumption of the form \( \vert {\mathbb {E}} \left[ \xi _1^2 \right] \vert ^2 \le 1-\beta \) cannot be avoided as the following argument shows. Indeed, if \( \vert {\mathbb {E}} \left[ \xi _1^2 \right] \vert ^2 =1\) the assumption \( {\mathbb {E}} \left[ \vert \xi _1^2 \vert \right] =1 \) implies that \( \xi = \lambda {\tilde{\xi }} \) almost surely, where \( \lambda \in \left\{ z\in {\mathbb {C}}: \Vert z \Vert =1 \right\} \) is fixed and \({\tilde{\xi }} \in {\mathbb {R}} \) is a real random variable. We observe that

$$\begin{aligned} \big \vert \langle \xi , x_0 \rangle \big \vert ^2&= \big \vert \langle \lambda {\tilde{\xi }} , x_0 \rangle \big \vert ^2 \end{aligned}$$
(3.6)
$$\begin{aligned}&= \big \vert \langle {\tilde{\xi }} , x_0 \rangle \big \vert ^2\end{aligned}$$
(3.7)
$$\begin{aligned}&= \big \vert \langle {\tilde{\xi }} , \overline{x_0} \rangle \big \vert ^2\end{aligned}$$
(3.8)
$$\begin{aligned}&=\big \vert \langle \xi , \overline{x_0} \rangle \big \vert ^2. \end{aligned}$$
(3.9)

Consequently, \(x_0\) and its complex-conjugate \(\overline{x_0}\) will always lead to the same measurements.

3.2 Real Signals and Complex Measurement Vectors

We have seen in Remark 1 that the assumption \( \vert {\mathbb {E}} \left[ \xi _1^2 \right] \vert ^2 \le 1-\beta \) is necessary to distinguish between a signal \(x_0\) and \(\overline{x_0}\). However, if, as in many practical applications, it is known a priori that the signal \(x_0\) is real-valued then this ambiguity cannot arise and we can uniquely recover without additional assumptions via the following natural variant of the PhaseLift method, where we restrict the search space to real-valued matrices.

$$\begin{aligned} \begin{aligned} \text {minimize } \quad&\Vert {\mathcal {A}} \left( X\right) -y \Vert _{\ell _1}\\ \text {such that} \quad&X \in {\mathcal {S}}^n_{+} \cap {\mathbb {R}}^{n \times n}. \end{aligned} \end{aligned}$$
(3.10)

The following theorem shows that in this scenario the assumption \(\vert {\mathbb {E}} \left[ \xi _1^2 \right] \vert ^2 \le 1-\beta \) is indeed not necessary.

Theorem 3

Let the observation vector y be given as in (1.1), where the random measurement vectors \( \left\{ \xi ^{\left( i\right) } \right\} _{i=1}^m \) are as defined in Sect. 2.3. Then the following two statements hold.

  1. 1.

    Assume that

    $$\begin{aligned} m \ge C_1 {K^{20}} n. \end{aligned}$$
    (3.11)

    Then, with probability at least \(1- {\mathcal {O}} \left( \exp \left( \frac{-m }{C_2 K^{16}} \right) \right) \) one has that for all vectors \(x_0 \in {\mathcal {X}}_{1/81}\cap {\mathbb {R}}^n\) and any noise vector \(w\in {\mathbb {R}}^m\) any solution \({\hat{X}}\) of (3.10) satisfies

    $$\begin{aligned} \Vert {\hat{X}} - x_0 x^*_0 \Vert _{1} \le C_3 {K^8}m \Vert w \Vert _{\ell _1}. \end{aligned}$$
    (3.12)
  2. 2.

    If, in addition, it holds that \({\mathbb {E}} \left[ \vert \xi _1 \vert ^4 \right] \ge 1+\beta \) for some \(\beta \in (0,1] \), then, under the refined assumption

    $$\begin{aligned} m \ge C_1 \frac{K^{20}}{\beta ^{5/2}} n, \end{aligned}$$
    (3.13)

    one has a more general bound. Namely it holds that with probability at least \(1- {\mathcal {O}} \left( \exp \left( \frac{-m \beta ^4}{C_2 K^{16}} \right) \right) \) and for all vectors \(x_0 \in {\mathbb {R}}^n \backslash \left\{ 0\right\} \), again for arbitrary noise vectors \(w\in {\mathbb {R}}^m\), any solution \({\hat{X}}\) of (3.10) satisfies

    $$\begin{aligned} \Vert {\hat{X}} - x_0 x^*_0 \Vert _{1} \le C_3 \frac{K^8}{m \beta ^{5/2}} \Vert w \Vert _{\ell _1}. \end{aligned}$$
    (3.14)

Here \(C_1\), \(C_2\), and \(C_3\) are universal constants.

Remark 2

In comparison to Theorem 1 the probability bound in Theorems 2 and 3 is slightly better, as it improves from \(1-\exp \left( - \Omega \left( n\right) \right) \) to \( 1-\exp \left( - \Omega \left( m\right) \right) \). Moreover, note that in contrast to Theorem 1 the dependence on the subgaussian distribution of \(\xi \) is not hidden in the constants. Also note that in our result the dependence on \(\beta \) is stated explicitly. However, we do not know whether these bounds are optimal with respect to K and \(\beta \).

4 Proof of Main Results

4.1 Proof of Theorem 2

Our goal is to show that with high probability the matrix \(x_0 x_0^*\) is close to the minimizer \({\hat{X}}\) of the expression \(\Vert {\mathcal {A}}(W)-y\Vert _{\ell _1}\) over all \(W\in {\mathcal {S}}^n_{+}\). A common proof strategy that we will also follow is to establish that all \(X \in {\mathcal {S}}^n_{+}\) with

$$\begin{aligned} \Vert {\mathcal {A}}(X)-y\Vert _{\ell _1} \le \Vert {\mathcal {A}}(x_0 x_0^*)-y\Vert _{\ell _1} = \Vert w\Vert _{\ell _1} \end{aligned}$$
(4.1)

are sufficiently close to the true solution in \(\Vert \cdot \Vert _{1} \)-norm. More precisely, a sufficient condition for inequality (3.3) is that every X fulfilling condition (4.1) satisfies

$$\begin{aligned} \Vert X - x_0 x^*_0 \Vert _{1} \le C_3 \frac{K^8}{m \beta ^{5/2}} \Vert w \Vert _{\ell _1}. \end{aligned}$$
(4.2)

Setting \(Z= X- x_0 x_0^*\), Eq. (4.1) reads

$$\begin{aligned} \Vert {\mathcal {A}}(Z)-w\Vert _{\ell _1} \le \Vert w\Vert _{\ell _1}. \end{aligned}$$
(4.3)

By the triangle inequality this implies that

$$\begin{aligned} \Vert {\mathcal {A}}(Z)\Vert _{\ell _1} < 2 \Vert w \Vert _{\ell _1}. \end{aligned}$$
(4.4)

Hence, the upper bound (4.2) that we aim to establish directly follows from an appropriate lower bound for \(\Vert {\mathcal {A}} \left( Z\right) \Vert _{\ell _1}/ \Vert Z \Vert _1 \). Here \(Z \in {\mathcal {S}}^n\) ranges over those matrices for which \(x_0 x^*_0 + Z \) is positive semidefinite. This set is convex, so it is locally well-approximated by a convex cone. To establish a uniform recovery result over all \(x_0 \in {\mathcal {X}}_{\mu }\), we need to study the union of the corresponding cones as given by

$$\begin{aligned} {\mathcal {M}}_{\mu }:= \text {cone} \left\{ Z \in {\mathcal {S}}^n : \exists x_0\in {\mathcal {X}}_{\mu }\text { such that } x_0x^*_0 + Z\in {\mathcal {S}}^n_{+}\right\} . \end{aligned}$$
(4.5)

We will refer to this set as the cone of admissible directions.

With this notation, our proof strategy can be summarized as establishing a lower bound for

$$\begin{aligned} \lambda _{\min } \left( {\mathcal {A}}, {\mathcal {M}}_{\mu } \right) := \underset{Z \in {\mathcal {M}}_{\mu } \setminus \left\{ 0\right\} }{\inf } \frac{\Vert {\mathcal {A}} \left( Z\right) \Vert _{\ell _1} }{\Vert Z \Vert _1}, \end{aligned}$$
(4.6)

which in the literature is commonly referred to as the minimum conic singular value (see, e.g., [27, 40]). Except for the precise nature of the cone under consideration, this strategy is exactly analogous to a number of works in the recent literature on linear inverse problems [8, 28]. In particular, the following lemma, which summarizes our motivating considerations above, can be seen as a variant of [8, Proposition 2.2].

Lemma 1

Let \({\mathcal {A}}\) be the operator defined in (2.3). Assume that \(y= {\mathcal {A}} \left( x_0 x^*_0 \right) +w \). Then the minimizer \({\hat{X}}\) of (2.4) satisfies

$$\begin{aligned} \Vert {\hat{X}}-x_0 x^*_0 \Vert _{1} \le \frac{2 \Vert w \Vert _{\ell _1}}{\lambda _{\min } \left( {\mathcal {A}}, {\mathcal {M}}_{\mu } \right) }. \end{aligned}$$
(4.7)

In the following, our goal will be to derive an appropriate lower bound for \(\lambda _{\min } \left( {\mathcal {A}}, {\mathcal {M}}_{\mu } \right) \). One difficulty in the analysis is that not all matrices belonging to \({\mathcal {M}}_{\mu } \) are positive semidefinite. Indeed, in this scenario one could use that for positive semidefinite matrices an approximate \(\ell _1\) isometry holds (see, e.g. [7, Sect. 3]). While not all matrices in \({\mathcal {M}}_{\mu }\) are positive semidefinite the following lemma states that each matrix belonging to \({\mathcal {M}}_{\mu } \) possesses at most one negative eigenvector.

Lemma 2

Suppose that \(Z\in {\mathcal {M}}_{\mu }\). Then Z has at most one strictly negative eigenvalue.

Proof

Let \(Z\in {\mathcal {M}}_{\mu }\). By definition of \( {\mathcal {M}}_{\mu }\) we can find \(x_0 \in {\mathcal {X}}_{\mu }\) and \(t>0 \) such that

$$\begin{aligned} x_0x^*_0 + tZ \in {\mathcal {S}}^n_{+}. \end{aligned}$$
(4.8)

Suppose now by contradiction that Z has two (strictly) negative eigenvalues with corresponding eigenvectors \( z_1, z_2 \in {\mathbb {C}}^n\). Then we can find a vector \(u \in \text {span} \left\{ z_1, z_2 \right\} \backslash \left\{ 0 \right\} \) such that \( \langle u,x_0 \rangle =0 \). This implies that for any \( t >0 \) we have that

$$\begin{aligned} u \left( x_0 x^*_0 + t Z \right) u^* = t u^*Zu < 0, \end{aligned}$$
(4.9)

which is a contradiction to (4.8). \(\square \)

Recall that for a matrix \(Z \in {\mathcal {S}}^n\) we denoted its eigenvalues by \(\left\{ \lambda _i \left( Z\right) \right\} ^n_{i=1} \) in decreasing order. By the previous lemma it holds that \( \lambda _i \left( Z\right) \ge 0\) for all \( i \in \left[ n-1\right] \) and all \(Z \in {\mathcal {M}}_{\mu } \). For the proof we will partition \({\mathcal {M}}_{\mu }\) into two sets. Namely, for \( \alpha > 0 \) we define

$$\begin{aligned} {\mathcal {M}}_{1,\mu , \alpha }&:= \left\{ Z\in {\mathcal {M}}_{\mu }: -\lambda _n \left( Z\right) \le \alpha \sum _{i=1}^{n-1} \lambda _i \left( Z \right) \right\} , \end{aligned}$$
(4.10)
$$\begin{aligned} {\mathcal {M}}_{2,\mu , \alpha }&:= \left\{ Z\in {\mathcal {M}}_{\mu }: -\lambda _n \left( Z\right) > \alpha \sum _{i=1}^{n-1} \lambda _i \left( Z \right) \right\} . \end{aligned}$$
(4.11)

The two sets can be interpreted in the following way. If we would suppose that \(\alpha =1 \) it would follow that \(\text {Tr}\,\left( Z\right) < 0 \) for all matrices \(Z\in {\mathcal {M}}_{2,\mu , \alpha } \). In particular, this implies that there is \(x_0 \in {\mathcal {X}}_{\mu }\) such that Z is in the descent cone of the function \(\text {Tr}\,\left( \cdot \right) \) at the point \(x_0 x^*_0 \). Hence, for \( \alpha < 1 \) we can interpret \({\mathcal {M}}_{2,\mu , \alpha } \) as a slightly enlarged union of descent cones. In order to bound \( \underset{Z \in {\mathcal {M}}_{2,\mu , \alpha } }{\inf } \Vert {\mathcal {A}} \left( Z\right) \Vert _{\ell _1} / \Vert Z \Vert _1 \) from below we will rely on the following lemma, which is proven in Sect. 6.

Lemma 3

Assume that one of following two conditions is satisfied for \( \beta \in (0,1] \):

  1. 1.

    It holds that \( \vert {\mathbb {E}} \left[ \xi _1^2 \right] \vert ^2 \le 1-\beta \). In this case we set \(\mu =1/81\).

  2. 2.

    In addition to \( \big \vert {\mathbb {E}} \left[ \xi _1^2 \right] \big \vert ^2 \le 1-\beta \), the inequality \({\mathbb {E}} \left[ \vert \xi _1 \vert ^4 \right] \ge 1+\beta \), is fulfilled. In this case we set \( \mu =1 \).

Moreover, assume that

$$\begin{aligned} m \ge C_1 \frac{K^{20}}{\beta ^{5}} n. \end{aligned}$$
(4.12)

Then with probability at least \(1- 2\exp \left( \frac{-m\beta ^4}{C_2 K^{16}} \right) \) it holds that

$$\begin{aligned} \underset{Z\in {\mathcal {M}}_{2,\mu , \alpha } \setminus \left\{ 0 \right\} }{\inf } \frac{\Vert {\mathcal {A}} \left( Z\right) \Vert _{\ell _1}}{ \Vert Z\Vert _{1}} \ge C_3 \frac{\beta ^{5/2}}{K^8} m , \end{aligned}$$
(4.13)

where \( \alpha = 4/5 \). Here \(C_1\), \(C_2\), and \(C_3\) are universal constants.

The proof of Lemma 3 makes use of the fact that the set \( {\mathcal {M}}_{2,\mu , \alpha } \) has low complexity in the sense that the matrices in \( {\mathcal {M}}_{2,\mu , \alpha } \) are approximately low-rank.

In contrast, the set \( {\mathcal {M}}_{1,\mu , \alpha } \) has rather high complexity. For example, note that \({\mathcal {S}}^n_{+} \subset {\mathcal {M}}_{1,\mu , \alpha } \). Nevertheless, the quantity \( \underset{Z\in {\mathcal {M}}_{1,\mu , \alpha } \setminus \left\{ 0 \right\} }{\inf } \Vert {\mathcal {A}} \left( Z\right) \Vert _{\ell _1} / \Vert Z\Vert _{1} \) can be bounded from below, because the measurement matrices \(\xi ^{\left( i\right) } (\xi ^{\left( i\right) })^* \) are positive semidefinite and the matrices in \({\mathcal {M}}_{1,\mu , \alpha } \) also have a dominant positive semidefinite component. This is achieved by the following lemma, whose proof can be found in Sect. 5.

Lemma 4

Let \(0 < \mu \le 1\), \( \alpha >0\), and \( \delta >0\). Assume that

$$\begin{aligned} m \ge \frac{C_1K^4n}{\delta ^2}. \end{aligned}$$
(4.14)

Then with probability at least \(1- {\mathcal {O}} \left( \exp \left( -\frac{m}{C_2K^4} \right) \right) \) for all \(Z\in {\mathcal {M}}_{1,\mu ,\alpha } \) it holds that

$$\begin{aligned} \frac{1}{m} \Vert {\mathcal {A}} \left( Z\right) \Vert _{\ell _1} \ge \frac{1-\delta -\alpha -\alpha \delta }{1+ \alpha } \Vert Z \Vert _1. \end{aligned}$$
(4.15)

Here \(C_1\) and \(C_2\) are absolute constants.

We remark that Lemma 4 would no longer hold if the measurement matrices \(\xi ^{\left( i\right) } (\xi ^{\left( i\right) })^* \) would be replaced by symmetric matrices with i.i.d. Gaussian entries (see [37, Proposition 1]).

Having gathered all the necessary ingredients we can prove the main result of this manuscript.

Proof of Theorem 2

Set \(\alpha =4/5\). The proof of the two statements is analogous, except that for the first statement we set \( \mu =1/81 \) whereas for the second statement we set \( \mu =1 \). By Lemma 4 and Assumption (3.2) it follows that with probability at least \(1-{\mathcal {O}} \left( \exp \left( -\frac{m}{CK^4} \right) \right) \)

$$\begin{aligned} \underset{Z\in {\mathcal {M}}_{1,\mu , \alpha } \setminus \left\{ 0 \right\} }{\inf } \frac{\Vert {\mathcal {A}} \left( Z\right) \Vert _{\ell _1}}{ \Vert Z \Vert _1} \gtrsim m. \end{aligned}$$
(4.16)

Furthermore, by Lemma 3 we have with probability at least \(1- 2\exp \left( \frac{-m\beta ^4}{C_2 K^{16}} \right) \) that

$$\begin{aligned} \underset{Z \in {\mathcal {M}}_{2,\mu , \alpha } \setminus \left\{ 0 \right\} }{\inf } \frac{\Vert {\mathcal {A}} \left( Z\right) \Vert _{\ell _1}}{ \Vert Z \Vert _{1}} \gtrsim \frac{\beta ^{5/2}}{K^8}m \end{aligned}$$
(4.17)

holds.

Set \(Z:= {\hat{X}} - x_0 x^*_0\). Note that by definition we have that Z is an admissible direction, i.e., \(Z\in {\mathcal {M}}_{\mu }\). It follows by (4.16), (4.17), and \({\mathcal {M}}_{\mu }= {\mathcal {M}}_{1,\mu , \alpha } \cup {\mathcal {M}}_{2,\mu , \alpha } \) that

$$\begin{aligned} \lambda _{\min } \left( {\mathcal {A}}, {\mathcal {M}}_{\mu } \right) \gtrsim \min \left\{ 1; \frac{\beta ^{5/2}}{K^8} \right\} m \ge \frac{\beta ^{5/2}}{K^8} m, \end{aligned}$$
(4.18)

where in the last inequality we used (2.7) and \(0<\beta \le 1 \). It follows by Lemma 1 that

$$\begin{aligned} \Vert {\hat{X}} - x_0 x^*_0 \Vert _1 \le \frac{2 \Vert w \Vert _{\ell _1}}{ \lambda _{\min } \left( {\mathcal {M}}_{\mu } \right) } \lesssim \frac{K^8}{m \beta ^{5/2}} \Vert w \Vert _{\ell _1}, \end{aligned}$$
(4.19)

which finishes the proof. \(\square \)

4.2 Proof of Theorem 3

The proof of Theorem 3 is in large parts analogous to the proof of Theorem 2. For this reason, we will only highlight the main differences. Replacing \({\mathcal {X}}_{\mu }\) by \( {\mathcal {X}}_{\mu }\cap {\mathbb {R}}^n \) and \( {\mathcal {M}}_{\mu }\) by \({\mathcal {M}}_{\mu }\cap {\mathbb {R}}^{n \times n} \) we can argue analogously to Sect. 4.1 with the only difference that Lemma 3 has to be replaced by the following variant.

Lemma 5

Assume that one of following two conditions is satisfied for \( \mu , \beta \in (0,1] \):

  1. 1.

    It holds that \(\mu = \frac{1}{81} \) and \( \beta =1 \).

  2. 2.

    It holds that \({\mathbb {E}} \left[ \vert \xi _1 \vert ^4 \right] \ge 1+\beta \) and \(\mu =1 \).

Moreover, assume that

$$\begin{aligned} m \ge C_1 \frac{K^{20}}{\beta ^{5}} n. \end{aligned}$$
(4.20)

Then with probability at least \(1- 2\exp \left( \frac{-m\beta ^4}{C_2 K^{16}} \right) \) it holds that

$$\begin{aligned} \underset{Z\in ({\mathcal {M}}_{2,\mu , \alpha } \cap {\mathbb {R}}^{n \times n} ) \setminus \left\{ 0 \right\} }{\inf } \frac{\Vert {\mathcal {A}} \left( Z\right) \Vert _{\ell _1}}{ \Vert Z\Vert _{1}} \ge C_3 \frac{\beta ^{5/2}}{K^8} m , \end{aligned}$$
(4.21)

where \( \alpha = 4/5 \). Here \(C_1\), \(C_2\), and \(C_3\) are universal constants.

In order to prove Lemma 5 we can proceed similarly as in the proof of Lemma 3, see Sect. 6, where in the proof of Lemma 3 we have highlighted the necessary modifications.

5 Proof of Lemma 4

Proof

Note that for any \(z\in {\mathbb {C}}^n \) we have that \(\Vert {\mathcal {A}} \left( zz^*\right) \Vert _{\ell _1} = \sum _{i=1}^{m} \vert \langle \xi _i , z \rangle \vert ^2 \). Let \(A\in {\mathbb {C}}^{m\times n}\) be the matrix whose rows are given by \( \left\{ \xi _i \right\} _{i=1}^m \). It follows that \( \Vert {\mathcal {A}} \left( zz^*\right) \Vert _{\ell _1} = \Vert Az \Vert ^2 \). It follows from [42, Theorem 4.6.1] that due to our assumption on m with probability at least \( 1- {\mathcal {O}} \left( \exp \left( -\frac{m}{CK^4} \right) \right) \) for all \(z\in {\mathbb {C}}^m \) it holds that

$$\begin{aligned} \left( 1-\delta \right) \Vert z \Vert ^2 \le \frac{1}{m} \Vert Az \Vert ^2 \le \left( 1+\delta \right) \Vert z \Vert ^2. \end{aligned}$$
(5.1)

Due to the observation above this is equivalent to

$$\begin{aligned} \left( 1-\delta \right) \Vert z \Vert ^2 \le \frac{1}{m} \Vert {\mathcal {A}} \left( zz^*\right) \Vert _1 \le \left( 1+\delta \right) \Vert z \Vert ^2 \end{aligned}$$
(5.2)

for all \(z \in {\mathbb {C}}^n \). We will assume in the following that (5.2) holds for all \(z \in {\mathbb {C}}^m\).

Let \(Z\in {\mathcal {M}}_{1,\mu , \alpha }\) with corresponding eigenvalue decomposition \( Z = \sum _{i=1}^{n} \lambda _i v_i v^*_i \). We observe that

$$\begin{aligned} \frac{1}{m} \Vert {\mathcal {A}} \left( Z\right) \Vert _{\ell _1}&= \frac{1}{m} \sum _{j=1}^{m} \left| \left( \xi ^{\left( j\right) }\right) ^* Z \xi ^{\left( i\right) } \right| \end{aligned}$$
(5.3)
$$\begin{aligned}&\ge \frac{1}{m} \sum _{j=1}^{m} \left( \xi ^{\left( j\right) }\right) ^* Z \xi ^{\left( j\right) } \end{aligned}$$
(5.4)
$$\begin{aligned}&= \frac{1}{m} \sum _{j=1}^{m} \left( \xi ^{\left( j\right) }\right) ^*\left( \sum _{i=1}^{n} \lambda _i v_i v^*_i \right) \xi ^{\left( j\right) } \end{aligned}$$
(5.5)
$$\begin{aligned}&= \frac{1}{m} \sum _{i=1}^{n} \lambda _i \sum _{j=1}^{m} \left( \xi ^{\left( j\right) }\right) ^* v_i v^*_i \xi ^{\left( j\right) } \end{aligned}$$
(5.6)
$$\begin{aligned}&= \frac{1}{m} \sum _{i=1}^{n} \lambda _i \Vert {\mathcal {A}} \left( v_i v^*_i\right) \Vert _{\ell _1}. \end{aligned}$$
(5.7)

By Lemma 2 we know that Z has at most one negative eigenvalue. If all eigenvalues \(\lambda _i \left( Z\right) \) are positive, this inequality chain and inequality (5.2) imply that

$$\begin{aligned} \frac{1}{m} \Vert {\mathcal {A}} \left( Z\right) \Vert _{\ell _1} \ge \left( 1-\delta \right) \sum _{i=1}^{n} \lambda _i = \left( 1-\delta \right) \Vert Z \Vert _1, \end{aligned}$$
(5.8)

which shows (4.15). Now suppose that \(\lambda _n \left( Z\right) <0\). By (5.2) and \( -\lambda _n \left( Z\right) \le \alpha \sum _{i=1}^{n-1} \lambda _i \left( Z\right) \), which is due to \(Z\in {\mathcal {M}}_{1,\mu ,\alpha } \), we obtain that

$$\begin{aligned} \begin{aligned} \frac{1}{m} \Vert {\mathcal {A}} \left( Z\right) \Vert _1&\ge \left( 1-\delta \right) \sum _{i=1}^{n} \lambda _j + \left( 1+\delta \right) \lambda _n\\&\ge \left( 1-\delta - \alpha \left( 1+\delta \right) \right) \sum _{i=1}^{n-1} \lambda _i. \end{aligned} \end{aligned}$$
(5.9)

Again using the relation \( -\lambda _n \left( Z\right) \le \alpha \sum _{i=1}^{n-1} \lambda _i \left( Z\right) \) we can also observe that

$$\begin{aligned} \Vert Z \Vert _1 = \sum _{j=1}^{n-1} \lambda _j -\lambda _n \le \left( 1+\alpha \right) \sum _{i=1}^{n-1} \lambda _j. \end{aligned}$$
(5.10)

Combining (5.9) and (5.10) shows (4.15), which finishes the proof. \(\square \)

6 Proof of Lemmas 3 and 5

In order to prove Lemmas 3 and 5 we will use the following version of Mendelson’s small ball method [25, 31], a tool for deriving a lower bound for nonnegative empirical process.

Lemma 6

[14, Lemma 1] Let \({\mathcal {Z}} \subset {\mathcal {S}}^n \) and let \(\xi ^{(1)}, \xi ^{(2)}, \ldots , \xi ^{(m)} \) be i.i.d. random vectors. Let \(u>0\) and \(t>0\) and define

$$\begin{aligned} Q_{{\mathcal {Z}}} \left( u \right) := \underset{Z \in {\mathcal {Z}}}{\inf } {\mathbb {P}} \left( \left| \left\langle \xi ^{\left( 1\right) } \left( \xi ^{\left( 1\right) }\right) ^*, Z \right\rangle _{HS} \right| \ge u \right) . \end{aligned}$$
(6.1)

Then, with probability at least \(1-2\exp \left( -2t^2 \right) \), it holds that

$$\begin{aligned}&\underset{Z \in {\mathcal {Z}}}{\inf } \left( \frac{1}{m} \sum _{i=1}^{m} \left| \left\langle \xi ^{\left( i\right) } \left( \xi ^{\left( i\right) }\right) ^*, Z \right\rangle _{HS} \right| \right) \end{aligned}$$
(6.2)
$$\begin{aligned}&\quad \ge u \left( Q_{{\mathcal {Z}}} \left( 2u \right) - \frac{4}{u} {\mathbb {E}} \left[ \underset{Z \in {\mathcal {Z}}}{\sup } \left| \frac{1}{m} \sum _{i=1}^{m} \varepsilon _i \left\langle \xi ^{\left( i\right) } \left( \xi ^{\left( i\right) }\right) ^* , Z \right\rangle _{HS} \right| \right] -\frac{t}{\sqrt{m}} \right) , \end{aligned}$$
(6.3)

where \( \left( \varepsilon _{i} \right) ^m_{i=1} \) are independent, symmetric, \(\left\{ -1,1 \right\} \)-valued random variables that are independent of \( \left( X_i \right) ^m_{i=1} \).

Our goal is to apply Lemma 6 to \({\mathcal {Z}}= {\mathcal {M}}_{2,\mu , \alpha } \cap \left\{ Z \in {\mathcal {S}}^n: \ \Vert Z \Vert _F =1 \right\} \). The following key lemma shows that matrices in \( {\mathcal {M}}_{2,\mu , \alpha } \) have two favorable properties: They are approximately low-rank and their mass with respect to the Frobenius norm is not concentrated on the diagonal for \(\mu \) is small. The first property follows directly from the fact that the negative eigenvalue is rather small, the second property requires the spectral flatness of \(x_0\), i.e., that \(\mu \) is bounded.

Lemma 7

Let \( \alpha >0 \) and \( 0 < \mu \le 1 \). Assume that \(Z \in {\mathcal {M}}_{2,\mu , \alpha }\). Then it holds that

  1. 1.
    $$\begin{aligned} \Vert Z \Vert _1 \le \left( 1+ \frac{1}{\alpha } \right) \Vert Z \Vert _{\text {HS}}, \end{aligned}$$
    (6.4)
  2. 2.
    $$\begin{aligned} \Vert \text {diag}\left( Z \right) \Vert _{HS} \le \left( \sqrt{1 - \frac{1}{ \left( 1 + \alpha ^{-1} \right) ^2 }} + 3\mu \right) \Vert Z \Vert _{HS}. \end{aligned}$$
    (6.5)

Proof

Let \(Z \in {\mathcal {M}}_{2,\mu , \alpha } \). By definition of \({\mathcal {M}}_{2,\mu , \alpha } \) we have that \( \alpha \sum _{i=1}^{n-1} \lambda _i \left( Z\right) < - \lambda _n \left( Z\right) \), which implies that

$$\begin{aligned} \Vert Z \Vert _1 = \sum _{i=1}^{n-1} \lambda _i \left( Z\right) -\lambda _n \left( Z\right) \le - \left( 1 + \frac{1}{\alpha } \right) \lambda _n \left( Z\right) \le \left( 1 + \frac{1}{\alpha } \right) \Vert Z \Vert _{\text {HS}}. \end{aligned}$$
(6.6)

This proves inequality (6.4).

In order to prove the second inequality note that by definiton of \({\mathcal {M}}_{2,\mu ,\alpha } \subset {\mathcal {M}}_{\mu } \) we can choose \(x_0 \in {\mathcal {X}}_{\mu } \cap S^{n-1} \) such that there exists \(t>0\) with \(x_0x^*_0 + tZ\) positive semidefinite. For this choice of \(x_0\) we can decompose Z uniquely into

$$\begin{aligned} Z= \underset{=: Z_1}{\underbrace{ -\lambda x_0x^*_0 + ux^*_0+ x_0u^*} }+ Z_2, \end{aligned}$$
(6.7)

where \( \lambda \in {\mathbb {R}} \), \( \langle u,x_0 \rangle =0\), and \(Z_2x_0=0\). We observe that

$$\begin{aligned} \Vert \text {diag}\left( Z \right) \Vert _{HS}&\le \Vert \text {diag}\left( Z_1 \right) \Vert _{HS} + \Vert \text {diag}\left( Z_2 \right) \Vert _{HS}. \end{aligned}$$
(6.8)

We will bound the two summands separately. We begin with \( \Vert \text {diag}\left( Z_1 \right) \Vert _{HS} \) and observe that

$$\begin{aligned} \begin{aligned} \Vert \text {diag}\left( Z_1 \right) \Vert _{\text {HS}}&\le \vert \lambda \vert \Vert \text {diag}\left( x_0 x^*_0\right) \Vert _{HS} + 2 \Vert \text {diag}\left( ux^*_0\right) \Vert _{\text {HS}}\\&= \vert \lambda \vert \sqrt{\sum _{i=1}^{n} \big \vert \left( x_0\right) _i \big \vert ^4 } + 2 \sqrt{\sum _{i=1}^{n} \big \vert \left( x_{0}\right) _i \big \vert ^2 \big \vert u_i \big \vert ^2 }\\&\le \mu \vert \lambda \vert + 2\mu \Vert u \Vert \\&\le 3\mu \Vert Z_1 \Vert _{\text {HS}}\\&\le 3\mu \Vert Z \Vert _{\text {HS}}. \end{aligned} \end{aligned}$$
(6.9)

In the first inequality we used the triangle inequality and in the third line we used that \(\Vert x_0 \Vert _{\infty } \le \mu \Vert x_0 \Vert = \mu \) due to \(x_0 \in {\mathcal {X}}_{\mu }\cap S^{n-1}\). In the fourth line we used that \(\vert \lambda \vert \le \Vert Z_1 \Vert _{HS} \) and \( \Vert u \Vert \le \Vert Z_1 \Vert _{HS} \), which follows from the fact that the summands of \(Z_1= -\lambda x_0 x^*_0 + ux^*_0 + x_0 u^* \) are orthogonal to each other. In the last line we again used that \(\Vert Z_1 \Vert _{HS} \le \Vert Z \Vert _{HS} \) as Z is decomposed orthogonally into \(Z=Z_1+Z_2\).

In order to bound \( \Vert \text {diag}\left( Z_2 \right) \Vert _{HS}\) we note first that \(Z_2\) is positive semidefinite. Indeed, suppose by contradiction that \(Z_2\) is not positive semidefinite. Then there would exist a vector \(v\in {\mathbb {C}}^n\) such that \( \langle v, x_0 \rangle =0\) and \( v^* Z_2 v <0\). In particular, this would imply that \(v^* \left( x_0 x^*_0 + t Z \right) v <0 \) for all \( t >0\), which is a contradiction to our choice of \(x_0\).

Now let \(w\in {\mathbb {C}}^n\) be the normalized (i.e., \( \Vert w \Vert =1 \)) eigenvector corresponding to the eigenvalue \( \lambda _n \left( Z\right) \). Then we obtain that

$$\begin{aligned} \lambda _n \left( Z \right) = w^* Z w = w^* Z_1 w + w^*Z_2w \ge w^* Z_1 w \ge - \Vert Z_1 \Vert \ge - \Vert Z_1 \Vert _{HS} , \end{aligned}$$
(6.10)

where the first inequality follows from the fact that \(Z_2\) is positive semidefinite. Using this observation we obtain that

$$\begin{aligned} \begin{aligned} \Vert \text {diag}\left( Z_2 \right) \Vert _{\text {HS}}&\le \Vert Z_2 \Vert _{\text {HS}}\\&= \sqrt{ \Vert Z \Vert ^2_{\text {HS}} - \Vert Z_1 \Vert _{\text {HS}}^2 }\\&\overset{}{\le } \sqrt{ \Vert Z \Vert ^2_{\text {HS}} - \lambda ^2_n \left( Z\right) } \\&\le \sqrt{ \Vert Z \Vert ^2_{\text {HS}} - \frac{1}{ \left( 1 + \alpha ^{-1} \right) ^2} \Vert Z \Vert _{1} }\\&\le \Vert Z \Vert _{\text {HS}} \sqrt{1 - \frac{1}{ \left( 1 + \alpha ^{-1} \right) ^2 }}, \end{aligned} \end{aligned}$$
(6.11)

where in the fourth line we used that \( - \lambda _n \left( Z\right) \ge \frac{1}{1+\alpha ^{-1}} \Vert Z \Vert _{1} \), which is a consequence of the first inequality of (6.6). Combining this estimate with (6.8) and (6.9) shows part (2), which finishes the proof.

\(\square \)

In analogy to [25] we bound \(Q_{{\mathcal {Z}}} \left( 2u\right) \) using the following lemma, whose proof is based on the Paley–Zygmund inequality. A key difference is that we use the Hanson–Wright inequality to control the fourth moment \({\mathbb {E}} \vert \xi ^* A \xi \vert ^4 \) appropriately.

Lemma 8

Let \(A \in {\mathcal {S}}^n\) and let \( \xi = \left( \xi _1, \ldots , \xi _n \right) \) be a random vector with independent and identically distributed entries \( \xi _i \) taking values in \( {\mathbb {C}} \) such that \( {\mathbb {E}} \xi _i = 0 \), \( {\mathbb {E}} \vert \xi _i \vert ^2 =1 \), and \(\Vert \xi _i \Vert _{\psi _2} \le K \). Then we have that

$$\begin{aligned} {\mathbb {P}} \left( \vert \xi ^* A \xi \vert ^2 \ge \frac{{\mathbb {E}}\vert \xi ^* A \xi \vert ^2}{2} \right) \ge \frac{ \left( {\mathbb {E}}\vert \xi ^* A \xi \vert ^2 \right) ^2 }{ C \left( K^8 \Vert A \Vert ^4_{HS} + \left( \text {Tr}\,\left( A\right) \right) ^4 \right) }. \end{aligned}$$
(6.12)

Here \(C>0 \) is an absolute constant.

Proof

Note that by the Paley–Zygmund inequality (see, e.g., [12]) we have that for all \( 0 < t \le {\mathbb {E}}\vert \xi ^* A \xi \vert ^2\)

$$\begin{aligned} {\mathbb {P}} \left( \vert \xi ^* A \xi \vert ^2 \ge t \right) \ge \frac{\left( {\mathbb {E}}\vert \xi ^* A \xi \vert ^2 - t \right) ^2}{{\mathbb {E}} \vert \xi ^* A \xi \vert ^4 }. \end{aligned}$$
(6.13)

In particular, setting \(t={\mathbb {E}}\vert \xi ^* A \xi \vert ^2/2 \) yields that

$$\begin{aligned} {\mathbb {P}} \left( \vert \xi ^* A \xi \vert ^2 \ge \frac{{\mathbb {E}}\vert \xi ^* A \xi \vert ^2}{2} \right) \ge \frac{\left( {\mathbb {E}}\vert \xi ^* A \xi \vert ^2 \right) ^2}{4 {\mathbb {E}} \vert \xi ^* A \xi \vert ^4 }. \end{aligned}$$
(6.14)

To estimate \({\mathbb {E}} \vert \xi ^* A \xi \vert ^4\) from above we note that the triangle inequality yields that

$$\begin{aligned} \begin{aligned} {\mathbb {E}} \vert \xi ^* A \xi \vert ^4&\lesssim {\mathbb {E}} \left[ \big \vert \xi ^* A \xi - {\mathbb {E}}\left[ \xi ^* A \xi \right] \big \vert ^4 \right] + \vert {\mathbb {E}} \xi ^* A \xi \vert ^4\\&= {\mathbb {E}} \left[ \big \vert \xi ^* A \xi - {\mathbb {E}}\left[ \xi ^* A \xi \right] \big \vert ^4 \right] + \left( \text {Tr}\,\left( A\right) \right) ^4. \end{aligned} \end{aligned}$$
(6.15)

In order to estimate the first summand we will use that \( \big \vert \xi ^* A \xi - {\mathbb {E}}\left[ \xi ^* A \xi \right] \big \vert \) has a mixed subgaussian/subexponential tail. We can bound the tail probability using the Hanson-Wright inequality (in the version of [36]), which states that there is a numerical constant \(c>0\) such that for all \(t>0\) it holds that

$$\begin{aligned} {\mathbb {P}} \left( \vert \xi ^* A \xi - {\mathbb {E}} \left[ \xi ^* A \xi \right] \vert > t \right) \le 2 \exp \left( -c \ \text {min} \left\{ \frac{t^2}{K^4 \Vert A \Vert ^4_{HS}} , \frac{t}{K^2 \Vert A \Vert } \right\} \right) . \end{aligned}$$
(6.16)

This yields that

$$\begin{aligned}&{\mathbb {E}} \left[ \big \vert \xi ^* A \xi - {\mathbb {E}}\left[ \xi ^* A \xi \right] \big \vert ^4 \right] \nonumber \\&\quad = 4 \int _{0}^{\infty } t^3 \ {\mathbb {P}} \left( \big \vert \xi ^* A \xi - {\mathbb {E}}\left[ \xi ^* A \xi \right] \big \vert > t \right) dt \end{aligned}$$
(6.17)
$$\begin{aligned}&\quad \le 8 \left( \int _{0}^{\infty } t^3 \ \exp \left( -c \frac{t^2}{K^4 \Vert A \Vert ^2_{HS}} \right) dt + \int _{0}^{\infty } t^3 \ \exp \left( -c \frac{t}{K^2 \Vert A \Vert } \right) dt \right) \end{aligned}$$
(6.18)
$$\begin{aligned}&\quad = 8 \left( K^8 \Vert A \Vert ^4_{HS} \int _{0}^{\infty } u \ \exp \left( -cu^2 \right) du + K^8 \Vert A \Vert ^4 \int _{0}^{\infty } u^3 \ \exp \left( -cu \right) du \right) \end{aligned}$$
(6.19)
$$\begin{aligned}&\quad \lesssim K^8 \Vert A \Vert ^4_{HS}, \end{aligned}$$
(6.20)

where the third line follows from a change of variables. Combining this inequality chain with (6.15) we obtain that

$$\begin{aligned} {\mathbb {E}} \vert \xi ^* A \xi \vert ^4 \lesssim K^8 \Vert A \Vert ^4_{HS} + \left( \text {Tr}\,\left( A\right) \right) ^4 . \end{aligned}$$
(6.21)

Inserting this into (6.14) finishes the proof. \(\square \)

In order to apply Lemma 8 we need a lower bound for \({\mathbb {E}} \left[ \vert \xi ^* A \xi \vert ^2 \right] \). The next lemma computes this quantity.

Lemma 9

Let \( \xi = \left( \xi _1, \ldots , \xi _n \right) \) be a random vector with independent and identically distributed entries \( \xi _i \) taking values in \( {\mathbb {C}} \) such that \( {\mathbb {E}} \xi _i = 0 \) and \( {\mathbb {E}} \vert \xi _i \vert ^2 =1 \). Then for all matrices \( A \in {\mathcal {S}}^n \) it holds that

$$\begin{aligned} \begin{aligned}&{\mathbb {E}} \left[ \vert \xi ^* A \xi \vert ^2 \right] \\&\quad = \left( \text {Tr}\,A \right) ^2 + \left( {\mathbb {E}} \vert \xi _i \vert ^4 -1 \right) \sum _{i=1}^{n} A^2_{i,i} + \left( 1 + \big \vert {\mathbb {E}} \left[ \xi ^2_1 \right] \big \vert ^2 \right) \sum _{i \ne j} \text {Re} \left( A_{i,j} \right) ^2\\&\qquad + \left( 1- \big \vert {\mathbb {E}} \left[ \xi ^2_1 \right] \big \vert ^2 \right) \sum _{i \ne j} \text {Im} \left( A_{i,j} \right) ^2. \end{aligned} \end{aligned}$$
(6.22)

Proof

First, we observe that

$$\begin{aligned} {\mathbb {E}} \left[ \vert \xi ^* A \xi \vert ^2 \right]&= {\mathbb {E}} \left[ \left( \sum _{i,j} A_{i,j} \overline{\xi _i} \xi _j \right) \left( \sum _{i',j'} \overline{ A_{i',j'}} \xi _{i'} \overline{ \xi _{j'}} \right) \right] \end{aligned}$$
(6.23)
$$\begin{aligned}&= \sum _{i,i',j,j'} {\mathbb {E}} \left[ \left( A_{i,j} \overline{\xi _i} \xi _j \right) \left( \overline{ A_{i',j'}} \xi _{i'} \overline{ \xi _{j'}} \right) \right] \end{aligned}$$
(6.24)
$$\begin{aligned}&= \sum _{i,j} {\mathbb {E}} \left[ \left( A_{i,i} \vert \xi _i \vert ^2 \right) \left( A_{j,j} \vert \xi _j \vert ^2 \right) \right] \nonumber \\&\quad + \sum _{i\ne j ,i' \ne j'} {\mathbb {E}} \left[ \left( A_{i,j} \overline{\xi _i} \xi _j \right) \left( \overline{ A_{i',j'}} \xi _{i'} \overline{ \xi _{j'}} \right) \right] \end{aligned}$$
(6.25)
$$\begin{aligned}&= \left( I\right) + \left( II\right) , \end{aligned}$$
(6.26)

where in the third line we used that \( {\mathbb {E}} \left[ \xi _i \right] = 0 \) and that the entries of \(\xi \) are independent, which implies that there are no summands where one index appears exactly three times. The first summand can be computed by

$$\begin{aligned} \left( I\right)&= \sum _i A^2_{i,i} {\mathbb {E}} \left[ \vert \xi _i \vert ^4 \right] + \sum _{i \ne j} A_{i,i} A_{j,j} {\mathbb {E}} \left[ \vert \xi _i \vert ^2 \right] {\mathbb {E}} \left[ \vert \xi _j \vert ^2 \right] \end{aligned}$$
(6.27)
$$\begin{aligned}&= \sum ^n_{i=1} A^2_{i,i} {\mathbb {E}} \left[ \vert \xi _i \vert ^4 \right] + \sum _{i \ne j} A_{i,i} A_{j,j} \end{aligned}$$
(6.28)
$$\begin{aligned}&= \left( \text {Tr}\,A \right) ^2 + \left( {\mathbb {E}} \left[ \vert \xi _i \vert ^4 \right] -1 \right) \sum _{i=1}^{n} A^2_{i,i} , \end{aligned}$$
(6.29)

where we have used that \(A_{i,i} = \overline{ A_{i,i} } \) for all \( i \in \left[ n\right] \) and \( {\mathbb {E}} \left[ \vert \xi _i \vert ^2 \right] =1\). The second summand can be computed by

$$\begin{aligned} (II)&= \sum _{i \ne j, i' \ne j'} A_{i,j} \overline{ A_{i',j'}} {\mathbb {E}} \left[ \overline{ \xi _i} \xi _j \xi _{i'} \overline{\xi _{j'}} \right] \end{aligned}$$
(6.30)
$$\begin{aligned}&= \sum _{i \ne j, i' \ne j'} A_{i,j} A_{j',i'} {\mathbb {E}} \left[ \overline{ \xi _i} \xi _j \xi _{i'} \overline{\xi _{j'}} \right] \end{aligned}$$
(6.31)
$$\begin{aligned}&= \sum _{i \ne j} A^2_{i,j} {\mathbb {E}} \left[ \overline{ \xi _i}^2 \right] {\mathbb {E}} \left[ \xi ^2_j \right] + \sum _{i \ne j} A_{i,j} A_{j,i} {\mathbb {E}} \left[ \vert \xi _i \vert ^2 \right] {\mathbb {E}} \left[ \vert \xi _j \vert ^2 \right] \end{aligned}$$
(6.32)
$$\begin{aligned}&= \sum _{i \ne j} A^2_{i,j} \left| {\mathbb {E}} \left[ \xi ^2_i \right] \right| ^2 + \sum _{i \ne j} \vert A_{i,j} \vert ^2\end{aligned}$$
(6.33)
$$\begin{aligned}&= \sum _{i \ne j} A^2_{i,j} \left| {\mathbb {E}} \left[ \xi ^2_i \right] \right| ^2 + \sum _{i \ne j} \vert A_{i,j} \vert ^2 \end{aligned}$$
(6.34)
$$\begin{aligned}&\overset{(a)}{=} \left| {\mathbb {E}} \left[ \xi ^2_1 \right] \right| ^2 \sum _{i \ne j} \left( \text {Re} \left( A_{i,j}\right) ^2 - \text {Im} \left( A_{i,j}\right) ^2 \right) + \sum _{i \ne j} \vert A_{i,j} \vert ^2\end{aligned}$$
(6.35)
$$\begin{aligned}&= \left( 1 + \left| {\mathbb {E}} \left[ \xi ^2_1 \right] \right| ^2 \right) \sum _{i \ne j} \text {Re} \left( A_{i,j} \right) ^2 + \left( 1- \left| {\mathbb {E}} \left[ \xi ^2_1 \right] \right| ^2 \right) \sum _{i \ne j} \text {Im} \left( A_{i,j} \right) ^2. \end{aligned}$$
(6.36)

For equation (a) we used the observation that

$$\begin{aligned} A^2_{i,j}+A^2_{j,i} = A^2_{i,j} + \overline{ A_{i,j}}^2 = 2 \text {Re} \left( A_{i,j}\right) ^2 - 2 \text {Im} \left( A_{i,j}\right) ^2. \end{aligned}$$
(6.37)

By summing up (I) and (II) we obtain equality (6.22).

\(\square \)

The lemmas above would allows us to find a lower bound for \(Q_{{\mathcal {Z}}} \left( 2u\right) \) in Lemma 6. We still need an upper bound for the Rademacher complexity \({\mathbb {E}} \left[ \underset{Z \in {\mathcal {Z}} }{\sup } \Big \vert \sum _{i=1}^{m}\varepsilon _i \left\langle \xi _i \xi ^*_i, Z \right\rangle _{HS} \Big \vert \right] \). The next lemma provides such a bound. In [28] a version of this lemma has already been presented. Nevertheless, we include a proof for completeness.

Lemma 10

Assume that \(m \ge C_1n\). Let \(\alpha >0 \), \(0 < \mu \le 1 \) and set \({\mathcal {Z}}:= {\mathcal {M}}_{1,\mu , \alpha } \cap \left\{ Z \in {\mathcal {S}}^n: \ \Vert Z \Vert _{HS}=1 \right\} \). Then we have that

$$\begin{aligned} {\mathbb {E}} \left[ \underset{Z \in {\mathcal {Z}} }{\sup } \left| \left\langle Z , \sum _{i=1}^{m} \varepsilon _i \xi _i \xi ^*_i \right\rangle _{HS} \right| \right] \le C_2 \left( 1+\frac{1}{\alpha }\right) K^2 \sqrt{mn}. \end{aligned}$$
(6.38)

\(C_1\) and \(C_2\) are absolute constants.

Proof

First, we note that by Hoelder’s inequality and Lemma 7 we obtain that

$$\begin{aligned} \begin{aligned} {\mathbb {E}} \left[ \underset{Z \in {\mathcal {Z}}}{\sup } \left| \left\langle Z , \sum _{i=1}^{m} \varepsilon _i \xi ^{\left( i\right) } \left( \xi ^{\left( i\right) }\right) ^* \right\rangle _{HS} \right| \right]&\le \left( \underset{Z \in {\mathcal {Z}}}{\sup } \Vert Z \Vert _1 \right) {\mathbb {E}} \left[ \Big \Vert \sum _{i=1}^{m} \varepsilon _i \xi ^{\left( i\right) } \left( \xi ^{\left( i\right) }\right) ^* \Big \Vert \right] \\&\le \left( 1 + \frac{1}{\alpha } \right) {\mathbb {E}} \left[ \Big \Vert \sum _{i=1}^{m} \varepsilon _i \xi ^{\left( i\right) } \left( \xi ^{\left( i\right) }\right) ^* \Big \Vert \right] . \end{aligned} \end{aligned}$$
(6.39)

To bound \( {\mathbb {E}} \left[ \Big \Vert \sum ^m_{i=1} \varepsilon _i \xi ^{\left( i\right) } \left( \xi ^{\left( i\right) }\right) ^* \Big \Vert \right] \) let \( {\mathcal {N}} \) be an \(\frac{1}{4}\)-covering of the unit sphere \(S^{n-1} \subset {\mathbb {R}}^n\) with respect to the Euclidean norm such that

$$\begin{aligned} \vert {\mathcal {N}} \vert \le 12^n. \end{aligned}$$
(6.40)

By [41, Lemma 5.4] we have that

$$\begin{aligned} \left\| \sum _{i=1}^{m} \varepsilon _i \xi ^{\left( i\right) } \left( \xi ^{\left( i\right) }\right) ^* \right\| \le 2 \underset{x \in {\mathcal {N}}}{\sup } \left| \left\langle x, \sum _{i=1}^{m} \varepsilon _i \xi ^{\left( i\right) } \left( \xi ^{\left( i\right) }\right) ^* x \right\rangle \right| . \end{aligned}$$
(6.41)

Fix \( x \in {\mathcal {N}} \) and observe that

$$\begin{aligned} \left\langle x, \sum _{i=1}^{m} \varepsilon _i \xi ^{\left( i\right) } \left( \xi ^{\left( i\right) }\right) ^* x \right\rangle = \sum _{i=1}^{m} \varepsilon _i \vert \langle \xi ^{\left( i\right) } ,x \rangle \vert ^2 = \sum _{i=1}^{m} z_i, \end{aligned}$$
(6.42)

where we have set \(z_i := \varepsilon _i \vert \langle \xi _i ,x \rangle \vert ^2 \). We observe that \( {\mathbb {E}} \left[ z_i\right] = 0 \) and, moreover,

$$\begin{aligned} \Vert z_i \Vert _{\psi _1} = \Vert \langle \xi ^{\left( i\right) } ,x \rangle ^2 \Vert _{\psi _1} \lesssim \Vert \langle \xi ^{\left( i\right) } ,x \rangle \Vert ^2_{\psi _2} \lesssim K^2 , \end{aligned}$$

where the first equality follows directly from the definition of the \(\Vert \cdot \Vert _{\psi _1} \)-norm.Footnote 2 The first inequality can be seen using [42, Lemma 2.7.6] and the second one using [42, Lemma 3.4.2]. By the Bernstein inequality (see, e.g., [42, Theorem 2.8.1]) we obtain that

$$\begin{aligned} {\mathbb {P}} \left( \left| \left\langle x, \sum _{i=1}^{m} \varepsilon _i \xi ^{\left( i\right) } \left( \xi ^{\left( i\right) }\right) ^* x \right\rangle \right| \ge t \right)= & {} {\mathbb {P}} \left( \left| \sum _{i=1}^{m} z_i \right| \ge t \right) \nonumber \\\le & {} 2 \exp \left( -c \min \left\{ \frac{t^2}{mK^4}; \frac{t}{K^2} \right\} \right) , \end{aligned}$$
(6.43)

where \( c>0 \) is some numerical constant. It follows from (6.40), (6.41), (6.43), and a union bound that

$$\begin{aligned} {\mathbb {P}} \left( \left\| \sum _{i=1}^{m} \varepsilon _i \xi ^{\left( i\right) } \left( \xi ^{\left( i\right) }\right) ^* \right\| \ge t \right)&\overset{}{\le } {\mathbb {P}} \left( 2 \underset{x \in {\mathcal {N}}}{\sup } \left| \left\langle x, \sum _{i=1}^{m} \varepsilon _i \xi ^{\left( i\right) } \left( \xi ^{\left( i\right) }\right) ^* x \right\rangle \right| \ge t \right) \end{aligned}$$
(6.44)
$$\begin{aligned}&\le \sum _{x\in {\mathcal {N}}} {\mathbb {P}} \left( 2 \left| \left\langle x, \sum _{i=1}^{m} \varepsilon _i \xi ^{\left( i\right) } \left( \xi ^{\left( i\right) }\right) ^* x \right\rangle \right| \ge t \right) \end{aligned}$$
(6.45)
$$\begin{aligned}&\overset{(6.40), }{\le } 2 \cdot 12^{n} \exp \left( -c' \min \left\{ \frac{t^2}{mK^4}; \frac{t}{K^2} \right\} \right) \end{aligned}$$
(6.46)
$$\begin{aligned}&= 2 \exp \left( {\tilde{c}} n -c' \min \left\{ \frac{t^2}{mK^4} ; \frac{t}{K^2} \right\} \right) , \end{aligned}$$
(6.47)

where \({\tilde{c}} = \log 12 \). Then, whenever \( m \ge \frac{{\tilde{c}}}{c} n \), we obtain that

$$\begin{aligned}&{\mathbb {E}} \left[ \left\| \sum _{i=1}^{m} \varepsilon _i \xi ^{\left( i\right) } \xi ^*_i \right\| \right] \end{aligned}$$
(6.48)
$$\begin{aligned}&\quad = \int _{0}^{\infty } {\mathbb {P}} \left( \left\| \sum _{i=1}^{m} \varepsilon _i \xi ^{\left( i\right) } \left( \xi ^{\left( i\right) }\right) ^* \right\| \ge t \right) dt \end{aligned}$$
(6.49)
$$\begin{aligned}&\quad \le \int _{0}^{ K^2 \sqrt{\frac{{\tilde{c}}}{c'} nm } } 1 dt + 2 \int _{ K^2 \sqrt{\frac{{\tilde{c}}}{c'} nm } }^{ mK^2 } \exp \left( {\tilde{c}} n - \frac{c' t^2}{mK^4} \right) dt \nonumber \\&\qquad + 2 \int _{ mK^2 }^{ \infty } \exp \left( {\tilde{c}} n - c' \frac{t}{K^2} \right) dt\end{aligned}$$
(6.50)
$$\begin{aligned}&\quad \lesssim K^2 \sqrt{nm} + \exp \left( {\tilde{c}} n\right) \left( \int _{ K^2 \sqrt{\frac{{\tilde{c}}}{c'} nm } }^{ \infty } \exp \left( - \frac{c't^2}{mK^4} \right) dt + \int _{ mK^2 }^{ \infty } \exp \left( - \frac{c't}{K^2} \right) dt \right) . \end{aligned}$$
(6.51)

In order to finish we need to estimate the two integrals. By a change of variables and [19, Lemma C.7] we obtain that

$$\begin{aligned} \int _{ K \sqrt{\frac{{\tilde{c}}}{c} nm } }^{ \infty } \exp \left( - \frac{c' t^2}{mK^4} \right) dt&= \frac{\sqrt{m}K^2}{\sqrt{2c'}} \int _{\sqrt{2 {\tilde{c}} n}}^{\infty } \exp \left( \frac{-t^2}{2} \right) dt \lesssim \sqrt{m} K^2 \exp \left( -{\tilde{c}} n \right) , \end{aligned}$$
(6.52)
$$\begin{aligned} \int _{mK^2 }^{ \infty } \exp \left( - \frac{c't}{K^2} \right) dt&= \frac{K^2}{c'} \exp \left( -c'm\right) . \end{aligned}$$
(6.53)

Inserting this in the inequality chain above yields that

$$\begin{aligned} {\mathbb {E}} \left[ \left\| \sum _{i=1}^{m} \varepsilon _i \xi ^{(i)} \left( \xi ^{\left( i\right) }\right) ^* \right\| \right] \lesssim K^2 \sqrt{nm}. \end{aligned}$$
(6.54)

Combined with inequality (6.39) this finishes the proof. \(\square \)

Now we have gathered all the ingredients to complete the proof.

Proof of Lemmas 3and 5 We will start by showing that \({\mathbb {E}} \left[ \vert \left\langle \xi \xi ^*, Z \right\rangle _{HS} \vert ^2 \right] \gtrsim \beta \Vert Z \Vert ^2_{HS} \) for all \( Z \in {\mathcal {M}}_{2,\mu , \alpha } \) in the case of Lemma 3, or all \( Z \in {\mathcal {M}}_{2,\mu , \alpha } \cap {\mathbb {R}}^{n \times n} \) in the case of Lemma 5, respectively.

We first consider the second case and assume that the condition \({\mathbb {E}} \left[ \vert \xi _i \vert ^4 \right] \ge 1+ \beta \) is satisfied for some \( \beta >0 \). By Lemma 9 we obtain that for all \(Z \in {\mathcal {M}}_{2,\mu ,\alpha } \) under the conditions of Lemma 3

$$\begin{aligned} \begin{aligned}&{\mathbb {E}} \left[ \vert \left\langle \xi \xi ^*, Z \right\rangle _{HS} \vert ^2 \right] \\&\quad = \left( \text {Tr}\,Z\right) ^2 + \left( {\mathbb {E}} \vert \xi _i \vert ^4 -1 \right) \sum _{i=1}^{n} Z^2_{i,i} + \left( 1 + \left| {\mathbb {E}} \left[ \xi ^2_1 \right] \right| ^2 \right) \sum _{i \ne j} \text {Re} \left( Z_{i,j} \right) ^2\\&\qquad + \left( 1- \left| {\mathbb {E}} \left[ \xi ^2_1 \right] \right| ^2 \right) \sum _{i \ne j} \text {Im} \left( Z_{i,j} \right) ^2\\&\quad \ge \beta \sum _{i=1}^{n} Z^2_{i,i} + \beta \Vert Z-\text {diag}\left( Z \right) \Vert ^2_{HS}\\&\quad = \beta \Vert Z \Vert ^2_{HS}. \end{aligned} \end{aligned}$$
(6.55)

Under the assumptions of Lemma 5 we observe that \(\sum _{i \ne j} \text {Im} \left( Z_{i,j} \right) ^2=0 \) and \(\sum _{i \ne j} \text {Re} \left( Z_{i,j} \right) ^2 = \Vert Z - \text {diag}\left( Z \right) \Vert ^2_{HS} \). Hence, a similar argument as before also leads to

$$\begin{aligned} {\mathbb {E}} \left[ \vert \left\langle \xi \xi ^*, Z \right\rangle _{HS} \vert ^2 \right] \ge \beta \Vert Z \Vert ^2_{HS}. \end{aligned}$$
(6.56)

Under the first assumption we obtain by Lemma 9 that for all \(Z \in {\mathcal {M}}_{2,\mu , \alpha } \)

$$\begin{aligned} \begin{aligned}&{\mathbb {E}} \left[ \vert \left\langle \xi \xi ^*, Z \right\rangle _{HS} \vert ^2 \right] \\&\quad = \left( \text {Tr}\,Z\right) ^2 + \left( {\mathbb {E}} \vert \xi _i \vert ^4 -1 \right) \sum _{i=1}^{n} Z^2_{i,i} + \left( 1 + \left| {\mathbb {E}} \left[ \xi ^2_1 \right] \right| ^2 \right) \sum _{i \ne j} \text {Re} \left( Z_{i,j} \right) ^2\\&\qquad + \left( 1- \left| {\mathbb {E}} \left[ \xi ^2_1 \right] \right| ^2 \right) \sum _{i \ne j} \text {Im} \left( Z_{i,j} \right) ^2\\&\quad \ge \beta \sum _{i \ne j} \text {Re} \left( Z_{i,j} \right) ^2 + \beta \sum _{i \ne j} \text {Im} \left( Z_{i,j} \right) ^2 \\&\quad \ge \beta \Vert Z - \text {diag}\left( Z \right) \Vert ^2_{HS}. \end{aligned} \end{aligned}$$
(6.57)

Similarly, under the assumptions of Lemma 5 we can again use that \(\sum _{i \ne j} \text {Im} \left( Z_{i,j} \right) ^2=0 \) and \(\sum _{i \ne j} \text {Re} \left( Z_{i,j} \right) ^2 = \Vert Z - \text {diag}\left( Z \right) \Vert ^2_{HS} \) to obtain by an analogous argument that

$$\begin{aligned} {\mathbb {E}} \left[ \vert \left\langle \xi \xi ^*, Z \right\rangle _{HS} \vert ^2 \right] \ge \beta \Vert Z - \text {diag}\left( Z \right) \Vert ^2_{HS}. \end{aligned}$$
(6.58)

The remainder of the proof will be the same for Lemmas 3 and 5. By Lemma 7 we have that

$$\begin{aligned} \Vert \text {diag}\left( Z \right) \Vert _{HS}&\le \left( \sqrt{1-\frac{1}{\left( 1+ \alpha ^{-1}\right) ^2 }} + 3 \mu \right) \Vert Z \Vert _{HS}\end{aligned}$$
(6.59)
$$\begin{aligned}&\le \left( 0.9 + \frac{1}{27} \right) \Vert Z \Vert _{HS}\end{aligned}$$
(6.60)
$$\begin{aligned}&\le 0.99 \Vert Z \Vert _{HS}. \end{aligned}$$
(6.61)

By the triangle inequality it follows that

$$\begin{aligned} \Vert Z -\text {diag}\left( Z \right) \Vert _{HS}&\ge \Vert Z \Vert _{HS} - \Vert \text {diag}\left( Z \right) \Vert _{HS}\end{aligned}$$
(6.62)
$$\begin{aligned}&\ge \frac{1}{{\tilde{C}}} \Vert Z \Vert _{HS} \end{aligned}$$
(6.63)

for \({\tilde{C}}=100\). Inserting into (6.57) one obtains that

$$\begin{aligned} {\mathbb {E}} \left[ \vert \left\langle \xi \xi ^*, Z \right\rangle _{HS} \vert ^2 \right] \ge \frac{\beta }{{\tilde{C}}^2} \Vert Z \Vert ^2_{HS}. \end{aligned}$$
(6.64)

Hence, we have shown in all cases that \({\mathbb {E}} \left[ \vert \left\langle \xi \xi ^*, Z \right\rangle _{HS} \vert ^2 \right] \ge \frac{\beta }{{\tilde{C}}^2} \Vert Z \Vert ^2_{HS} \).

By Lemma 8 it follows that for all \(Z\in {\mathcal {M}}_{2,\mu , \alpha } \)

$$\begin{aligned} {\mathbb {P}} \left( \big \vert \left\langle \xi \xi ^*, Z \right\rangle _{HS} \big \vert \ge \frac{ \sqrt{ \beta } \Vert Z \Vert _{HS}}{\sqrt{2}{\tilde{C}} } \right)&\ge {\mathbb {P}} \left( \big \vert \left\langle \xi \xi ^*, Z \right\rangle _{HS} \big \vert ^2 \ge \frac{ {\mathbb {E}} \left[ \vert \left\langle \xi \xi ^* , Z \right\rangle _{HS} \vert ^2 \right] }{2 } \right) \end{aligned}$$
(6.65)
$$\begin{aligned}&\ge \frac{\beta ^2 \Vert Z \Vert _{HS}^4}{ {\tilde{C}}^4 C \left( K^8 \Vert Z \Vert ^4_{HS} + \left( \text {Tr}\,\left( Z \right) \right) ^4 \right) }. \end{aligned}$$
(6.66)

Note that for all \(Z \in {\mathcal {M}}_{2,\mu , \alpha } \)

$$\begin{aligned} \vert \text {Tr}\,\left( Z\right) \vert ^4 \le&\Vert Z \Vert ^4_1 \overset{}{\lesssim } \Vert Z \Vert ^4_{HS}, \end{aligned}$$
(6.67)

where in the last inequality we also used that \( \alpha = \frac{4}{5} \). This shows that for all \(Z \in {\mathcal {M}}_{2,\mu , \alpha } \) it holds that

$$\begin{aligned} {\mathbb {P}} \left( \left| \left\langle \xi \xi ^*, Z \right\rangle _{HS} \right| \ge \frac{ \sqrt{\beta } \Vert Z \Vert _{HS}}{\sqrt{2}{\tilde{C}} } \right) \gtrsim \frac{\beta ^2}{K^8}, \end{aligned}$$
(6.68)

where we used that \( K \gtrsim 1 \) due to (2.7). Now recall that \( {\mathcal {Z}}:= {\mathcal {M}}_{2,\mu , \alpha } \cap \left\{ Z\in {\mathcal {S}}^n: \ \Vert Z \Vert _{HS} =1 \right\} \). Thus we have shown that

$$\begin{aligned} Q_{{\mathcal {Z}}} \left( 2u\right) \ge \frac{\beta ^2}{C''K^8}, \end{aligned}$$
(6.69)

where \(Q_{{\mathcal {Z}}} \left( \cdot \right) \) is defined in (6.1). We used that \( u= \frac{ \sqrt{\beta }}{2\sqrt{2} {\tilde{C}}} \) and \(C''>0\) is a constant chosen large enough. From Lemma 10 it follows that

$$\begin{aligned} {\mathbb {E}} \left[ \underset{Z \in {\mathcal {Z}} }{\sup } \left| \frac{1}{m} \left\langle Z , \sum _{i=1}^{m} \varepsilon _i \xi ^{\left( i\right) } \left( \xi ^{\left( i\right) }\right) ^* \right\rangle _{HS} \right| \right] \lesssim K^2 \sqrt{\frac{n}{m}}. \end{aligned}$$
(6.70)

Combining this inequality with our choice of u and choosing the constant in assumption (4.12) large enough it follows that

$$\begin{aligned} \frac{1}{u}{\mathbb {E}} \left[ \underset{Z \in {\mathcal {Z}}}{\sup } \left| \frac{1}{m} \sum _{i=1}^{m} \varepsilon _i \left\langle \xi ^{\left( i\right) } \left( \xi ^{\left( i\right) }\right) ^* , Z \right\rangle _{HS} \right| \right] \le \frac{\beta ^2}{8C''K^8}. \end{aligned}$$
(6.71)

Applying Lemma 6 yields that with probability at least \(1-2\exp \left( -2t^2\right) \)

$$\begin{aligned}&\underset{Z \in {\mathcal {Z}}}{\inf } \left( \frac{1}{m} \sum _{i=1}^{m} \left| \left\langle \xi ^{\left( i\right) } \left( \xi ^{\left( i\right) }\right) ^* , Z \right\rangle _{HS} \right| \right) \nonumber \\&\quad \gtrsim \sqrt{\beta } \left( Q_{{\mathcal {Z}}} \left( 2u\right) - \frac{4}{u} {\mathbb {E}} \left[ \underset{Z \in {\mathcal {Z}}}{\sup } \left| \frac{1}{m} \sum _{i=1}^{m} \varepsilon _i \left\langle \xi ^{\left( i\right) } \left( \xi ^{\left( i\right) }\right) ^* , Z \right\rangle _{HS} \right| \right] -\frac{t}{\sqrt{m}} \right) \end{aligned}$$
(6.72)
$$\begin{aligned}&\quad \ge \sqrt{\beta } \left( \frac{\beta ^2}{2C''K^8} - \frac{t}{\sqrt{m}} \right) . \end{aligned}$$
(6.73)

Setting \(t= \frac{\sqrt{m} \beta ^2 }{4 C'' K^8} \) it follows that with probability at least \(1- 2 \exp \left( \frac{-m\beta ^4}{8 (C'')^2 K^{16}} \right) \) it holds that

$$\begin{aligned} \underset{Z \in {\mathcal {Z}}}{\inf } \left( \frac{1}{m} \sum _{i=1}^{m} \big \vert \left\langle \xi ^{\left( i\right) } \left( \xi ^{\left( i\right) }\right) ^* , Z \right\rangle _{HS} \big \vert \right) \gtrsim \frac{\beta ^{5/2}}{K^8}. \end{aligned}$$
(6.74)

Hence, by the definition of \({\mathcal {A}}\) and \({\mathcal {Z}}\) it follows that

$$\begin{aligned} \underset{Z \in {\mathcal {M}}_{2,\mu , \alpha } }{\inf } \frac{\Vert {\mathcal {A}} \left( Z\right) \Vert _{\ell _1}}{\Vert Z \Vert _{HS}} \gtrsim \frac{\beta ^{5/2}}{K^8} m . \end{aligned}$$
(6.75)

Due to \(\alpha =\frac{4}{5} \) and Lemma 7 we have that \(\Vert Z \Vert _{1} \le \frac{9}{4} \Vert Z \Vert _{HS} \) for all \(Z \in {\mathcal {M}}_{2,\mu , \alpha } \). Combined with (6.75) this shows (4.13). \(\square \)