Skip to main content
Log in

Robust covariance estimation for distributed principal component analysis

  • Published:
Metrika Aims and scope Submit manuscript

Abstract

Fan et al. (Ann Stat 47(6):3009–3031, 2019) constructed a distributed principal component analysis (PCA) algorithm to reduce the communication cost between multiple servers significantly. However, their algorithm’s guarantee is only for sub-Gaussian data. Spurred by this deficiency, this paper enhances the effectiveness of their distributed PCA algorithm by utilizing robust covariance matrix estimators of Minsker (Ann Stat 46(6A):2871–2903, 2018) and Ke et al. (Stat Sci 34(3):454–471, 2019) to tame heavy-tailed data. The theoretical results demonstrate that when the sampling distribution is symmetric innovation with the bounded fourth moment or asymmetric with the finite 6th moment, the statistical error rate of the final estimator produced by the robust algorithm is similar to that of sub-Gaussian tails. Extensive numerical trials support the theoretical analysis and indicate that our algorithm is robust to heavy-tailed data and outliers.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

Notes

  1. The original data consists of 24017 instances and 2400 features. We employ the first 1000 features of each instance as the sample.

References

  • Anderson TW (1963) Asymptotic theory for principal component analysis. Ann Math Stat 34(1):122–148

    Article  MathSciNet  Google Scholar 

  • Avella-Medina M, Battey HS, Fan J, Li Q (2018) Robust estimation of high-dimensional covariance and precision matrices. Biometrika 105(2):271–284

    Article  MathSciNet  Google Scholar 

  • Bhaskara A, Wijewardena PM (2019) On distributed averaging for stochastic \(k\)-PCA. In: Advances in neural information processing systems, pp 11024–11033

  • Bickel PJ, Levina E (2008) Covariance regularization by thresholding. Ann Stat 36:2577–2604

    MathSciNet  MATH  Google Scholar 

  • Catoni O (2012) Challenging the empirical mean and empirical variance: a deviation study. Annales de l’Institut Henri Poincaré, Probabilités et Statistiques 48(4):1148–1185

  • Catoni O (2016) PAC-Bayesian bounds for the Gram matrix and least squares regression with a random design. arXiv:1603.05229

  • Chen TL, Chang DD, Huang S-Y, Chen H, Lin C, Wang W (2016) Integrating multiple random sketches for singular value decomposition. arXiv:1608.08285

  • Chen X, Lee JD, Li H, Yang Y (2021) Distributed estimation for principal component analysis: an enlarged Eigenspace analysis. J Am Stat Assoc 47:1–31

    Google Scholar 

  • Davis AW (1977) Asymptotic theory for principal component analysis: non-normal case. Aust J Stat 19(3):206–212

    Article  MathSciNet  Google Scholar 

  • Dua D, Graff C (2017) UCI machine learning repository. http://archive.ics.uci.edu/ml

  • El Karoui N, d’Aspremont A (2010) Second order accurate distributed eigenvector computation for extremely large matrices. Electron J Stat 4:1345–1385

  • Fan J, Fan Y, Lv J (2008) High dimensional covariance matrix estimation using a factor model. J Econom 147:186–197

    Article  MathSciNet  Google Scholar 

  • Fan J, Liu H, Wang W (2018) Large covariance estimation through elliptical factor models. Ann Stat 46(4):1383

    Article  MathSciNet  Google Scholar 

  • Fan J, Wang D, Wang K, Zhu Z (2019a) Distributed estimation of principal eigenspaces. Ann Stat 47(6):3009–3031

  • Fan J, Wang W, Zhong Y (2019b) Robust covariance estimation for approximate factor models. J Econom 208(1):5–22

  • Fan J, Guo Y, Wang K (2021a) Communication-efficient accurate statistical estimation. J Am Stat Assoc (to appear)

  • Fan J, Wang W, Zhu Z (2021b) A shrinkage principle for heavy-tailed data: high-dimensional robust low-rank matrix recovery. Ann Stat 49(3):1239–1266

  • Han F, Liu H (2018) ECA: high-dimensional elliptical component analysis in non-Gaussian distributions. J Am Stat Assoc 113(521):252–268

    Article  MathSciNet  Google Scholar 

  • Huber PJ (1964) Robust estimation of a location parameter. Ann Math Stat 35:73–101

    Article  MathSciNet  Google Scholar 

  • Janzamin M, Sedghi H, Anandkumar A (2014) Score function features for discriminative learning: matrix and tensor framework. arXiv:1412.2863

  • Jordan MI, Lee JD, Yang Y (2018) Communication-efficient distributed statistical inference. J Am Stat Assoc 114(526):668–681

    Article  MathSciNet  Google Scholar 

  • Ke Y, Minsker S, Ren Z, Sun Q, Zhou W-X (2019) User-friendly covariance estimation for heavy-tailed distributions. Stat Sci 34(3):454–471

    Article  MathSciNet  Google Scholar 

  • Lee JD, Liu Q, Sun Y, Taylor JE (2017) Communication-efficient sparse regression. J Mach Learn Res 18:1–30

    MathSciNet  MATH  Google Scholar 

  • Mendelson S, Zhivotovskiy N (2018) Robust covariance estimation under \(L_{4}-L_{2}\) norm equivalence. Ann Stat 48(3):1648–1664

    MATH  Google Scholar 

  • Minsker S (2018) Sub-gaussian estimators of the mean of a random matrix with heavy-tailed entries. Ann Stat 46(6A):2871–2903

    Article  MathSciNet  Google Scholar 

  • Minsker S, Wei X (2017) Estimation of the covariance structure of heavy-tailed distributions. In: Advances in neural information processing systems, pp 2855–2864

  • Minsker S, Wei X (2020) Robust modifications of U-statistics and applications to covariance estimation problems. Bernoulli 26(1):694–727

    Article  MathSciNet  Google Scholar 

  • Pearson K (1901) On lines and planes of closest fit to systems of points in space. Lond Edinb Dublin Philos Mag J Sci 2(11):559–572

    Article  Google Scholar 

  • Schizas ID, Aduroja A (2015) A distributed framework for dimensionality reduction and denoising. IEEE Trans Signal Process 63(23):6379–6394

    Article  MathSciNet  Google Scholar 

  • Tian L, Gu Q (2017) Communication-efficient distributed sparse linear discriminant analysis. In: Artificial intelligence and statistics, pp 1178–1187

  • Wang W, Fan J (2017) Asymptotics of empirical eigenstructure for high dimensional spiked covariance. Ann Stat 45(3):1342–1374

    Article  MathSciNet  Google Scholar 

  • Yang Z, Balasubramanian K, Liu H (2017) High-dimensional non-Gaussian single index models via thresholded score function estimation. In: International conference on machine learning, pp 3851–3860

  • Yu Y, Wang T, Samworth RJ (2014) A useful variant of the Davis–Kahan theorem for statisticians. Biometrika 102(2):315–323

    Article  MathSciNet  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Kangqiang Li.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

This work was supported by grants from the NSF of China (Grant No. 11731012), Ten Thousands Talents Plan of Zhejiang Province (Grant No. 2018R52042) and the Fundamental Research Funds for the Central Universities.

Appendix

Appendix

1.1 Proof of Lemma 1

Proof

Instead of \(\max (e^x-1,0)\) in the proof of Theorem 3.2 in Minsker (2018), we define \(\phi (x):= e^{x}-x-1\). For \(t \ge 0,\)

$$\begin{aligned} {\text {P}}\left( \lambda _{\max }\left( n{\widehat{\Sigma }}_{n}(\alpha ,\theta )-n\Sigma \right) \ge nt\right)= & {} {\text {P}}\left( \phi \left( \lambda _{\max } \left( n\theta {\widehat{\Sigma }}_{n}(\alpha ,\theta )-n\theta \Sigma \right) \ge \phi (n\theta t)\right) \right) \\\le & {} \frac{1}{\phi (n\theta t)}\mathbb {E} {\text {tr}}\left[ \phi \left( n\theta {\widehat{\Sigma }}_{n}(\alpha ,\theta )-n\theta \Sigma \right) \right] . \end{aligned}$$

By following the proof of Lemma 3.1 in Minsker (2018). we can obtain

$$\begin{aligned} \mathbb {E} {\text {tr}}\left[ \exp \left( n\theta {\widehat{\Sigma }}_{n}(\alpha ,\theta )-n\theta \Sigma \right) \right] \le {\text {tr}} \left[ \exp \left( \sum _{i=1}^{n}c_{\alpha }\theta ^{\alpha } \mathbb {E} |X_{i}X_{i}^{T}|^{\alpha } \right) \right] . \end{aligned}$$

Due to

$$\begin{aligned} -\log \left( I-\theta X_{i} {X_{i}}^{T}+c_{\alpha }\theta ^{\alpha } {|X_{i} {X_{i}}^{T}|}^{\alpha }\right)\preceq & {} \psi _{\alpha }\left( \theta X_{i} {X_{i}}^{T}\right) \\\preceq & {} \log \left( I+\theta X_{i} {X_{i}}^{T}+c_{\alpha }\theta ^{\alpha } {|X_{i} {X_{i}}^{T}|}^{\alpha }\right) \end{aligned}$$

and \(\log (1+x)\le x\), it yields

$$\begin{aligned}&-\mathbb {E}{\text {tr}}\left[ \left( n\theta {\widehat{\Sigma }}_{n}(\alpha ,\theta )-n\theta \Sigma \right) \right] \\&\quad \le \mathbb {E}{\text {tr}}\left[ \sum _{i=1}^{n}\log \left( I-\theta X_{i} {X_{i}}^{T}+c_{\alpha }\theta ^{\alpha } {|X_{i} {X_{i}}^{T}|}^{\alpha }\right) +n\theta \Sigma \right] \\&\quad \le \mathbb {E}{\text {tr}}\left[ \sum _{i=1}^{n}\left( -\theta X_{i} {X_{i}}^{T}+c_{\alpha }\theta ^{\alpha } {|X_{i} {X_{i}}^{T}|}^{\alpha }\right) +n\theta \Sigma \right] ={\text {tr}} \left[ \sum _{i=1}^{n}c_{\alpha }\theta ^{\alpha } \mathbb {E} |X_{i}X_{i}^{T}|^{\alpha }\right] . \end{aligned}$$

Therefore,

$$\begin{aligned}&\mathbb {E} {\text {tr}}\left[ \phi \left( n\theta {\widehat{\Sigma }}_{n}(\alpha ,\theta )-n\theta \Sigma \right) \right] \\&\quad \le {\text {tr}} \left[ \exp \left( \sum _{i=1}^{n}c_{\alpha }\theta ^{\alpha } \mathbb {E} |X_{i}X_{i}^{T}|^{\alpha } \right) +\sum _{i=1}^{n}c_{\alpha }\theta ^{\alpha } \mathbb {E} |X_{i}X_{i}^{T}|^{\alpha }-I\right] . \end{aligned}$$

Because \(\frac{e^{x}-x-1}{x}=\sum _{i=1}^{\infty } \frac{x^{i}}{(i +1)!}\), we have

$$\begin{aligned}&{\text {tr}} \left[ \exp \left( \sum _{i=1}^{n}c_{\alpha }\theta ^{\alpha } \mathbb {E} |X_{i}X_{i}^{T}|^{\alpha } \right) +\sum _{i=1}^{n}c_{\alpha }\theta ^{\alpha } \mathbb {E} |X_{i}X_{i}^{T}|^{\alpha }-I\right] \\&\quad ={\text {tr}}\left[ c_{\alpha }\theta ^{\alpha } \sqrt{n\mathbb {E} |X_{i}X_{i}^{T}|^{\alpha }}\left( 2I+\sum _{k=2}^{\infty }{\left( c_{\alpha }\theta ^{\alpha } n\mathbb {E} |X_{i}X_{i}^{T}|^{\alpha }\right) ^{k-1}}/{k !}\right) \sqrt{n\mathbb {E} |X_{i}X_{i}^{T}|^{\alpha }}\right] \\&\quad \le {\text {tr}}\left[ c_{\alpha }\theta ^{\alpha }n\mathbb {E} |X_{i}X_{i}^{T}|^{\alpha }\left( 2+\sum _{k=2}^{\infty }{\left( c_{\alpha }\theta ^{\alpha }n\left\| \mathbb {E} |X_{i}X_{i}^{T}|^{\alpha }\right\| _2\right) ^{k-1}}/{k !}\right) \right] \\&\quad = {\bar{d}}_{\alpha }\left( \exp \left( c_{\alpha }\theta ^{\alpha }n\left\| \mathbb {E} |X_{i}X_{i}^{T}|^{\alpha }\right\| _{2}\right) +c_{\alpha }\theta ^{\alpha }n\left\| \mathbb {E} |X_{i}X_{i}^{T}|^{\alpha }\right\| _{2}-1\right) . \end{aligned}$$

By \(\frac{e^{x}}{e^{x}-x-1} \le 1+\frac{2}{x}+\frac{2}{x^{2}}\) for \(x>0\), it yields

$$\begin{aligned}&\left( \exp \left( c_{\alpha }n\theta ^{\alpha }\left\| \mathbb {E} |X_{i}X_{i}^{T}|^{\alpha }\right\| _{2}\right) +c_{\alpha }n\theta ^{\alpha }\left\| \mathbb {E} |X_{i}X_{i}^{T}|^{\alpha }\right\| _{2}-1\right) /\phi (n\theta t)\\&\quad \le \left( \exp \left( c_{\alpha }n\theta ^{\alpha }\left\| \mathbb {E} |X_{i}X_{i}^{T}|^{\alpha }\right\| _{2}-\theta nt\right) +c_{\alpha }n\theta ^{\alpha }\left\| \mathbb {E} |X_{i}X_{i}^{T}|^{\alpha }\right\| _{2}e^{-\theta nt}\right) \\&\qquad \frac{e^{n\theta t}}{e^{n\theta t}-n\theta t-1}\\&\quad \le \left( \exp \left( c_{\alpha }n\theta ^{\alpha }\left\| \mathbb {E} |X_{i}X_{i}^{T}|^{\alpha }\right\| _{2}-n\theta t\right) +c_{\alpha }n\theta ^{\alpha }\left\| \mathbb {E} |X_{i}X_{i}^{T}|^{\alpha }\right\| _{2}e^{-\theta nt}\right) \\&\qquad \left( 1+\frac{2}{n\theta t}+\frac{2}{(n\theta t)^{2}}\right) . \end{aligned}$$

Therefore,

$$\begin{aligned}&{\text {P}}\left( \lambda _{\max }\left( {\widehat{\Sigma }}_{n}(\alpha ,\theta )-\Sigma \right) \ge t\right) \\&\quad \le {\bar{d}}_{\alpha }e^{-\theta nt}\left( e^{c_{\alpha }n \theta ^{\alpha }v^{\alpha }}+c_{\alpha }n \theta ^{\alpha }v^{\alpha }\right) \left( 1+\frac{2}{\theta nt}+\frac{2}{(\theta nt)^2}\right) . \end{aligned}$$

By the same way, we have

$$\begin{aligned}&{\text {P}}\left( \lambda _{\min }\left( {\widehat{\Sigma }}_{n}(\alpha ,\theta )-\Sigma \right) \right. \\&\quad \left. \le -t\right) \le {\bar{d}}_{\alpha }e^{-\theta nt}\left( e^{c_{\alpha }n \theta ^{\alpha }v^{\alpha }}+c_{\alpha }n \theta ^{\alpha }v^{\alpha }\right) \left( 1+\frac{2}{\theta nt}+\frac{2}{(\theta nt)^2}\right) . \end{aligned}$$

\(\square \)

1.2 Proof of Theorem 1

Proof

For \(\forall k, s \in [d]\) and \(t>0\), by \(\sigma _{k,s}:=\Sigma _{(k,s)}\), we have

$$\begin{aligned}&{\text {P}}\left( {\widehat{\sigma }}_{k,s}-\sigma _{k,s}\ge t\right) \\&\quad ={\text {P}}\left( \frac{n}{\tau _{k,s}}{\widehat{\sigma }}_{k,s}\ge \frac{n}{\tau _{k,s}}\sigma _{k,s}+\frac{nt}{\tau _{k,s}}\right) \le e^{-\frac{n}{\tau _{k,s}}(t+\sigma _{k,s})}\mathbb {E}\left[ \exp \left( n{\widehat{\sigma }}_{k,s}/\tau _{k,s}\right) \right] \\&\quad =e^{-\frac{n}{\tau _{k,s}}(t+\sigma _{k,s})}\mathbb {E}\left[ \exp \left( \sum _{i=1}^{n} \psi _{\tau _{k,s}}\left( {X_{i}}_{(k)}{X_{i}}_{(s)}\right) / \tau _{k,s}\right) \right] \\&\quad =e^{-\frac{n}{\tau _{k,s}}(t+\sigma _{k,s})}\mathbb {E}\left[ \exp \left( \sum _{i=1}^{n} \psi _{1}\left( {X_{i}}_{(k)}{X_{i}}_{(s)}/ \tau _{k,s}\right) \right) \right] \\&\quad \le e^{-\frac{n}{\tau _{k,s}}(t+\sigma _{k,s})}\mathbb {E}\left[ \prod _{i=1}^{n}\left( 1+{X_{i}}_{(k)}{X_{i}}_{(s)}/ \tau _{k,s}+\left| {X_{i}}_{(k)}{X_{i}}_{(s)}/ \tau _{k,s}\right| ^{\alpha }\right) \right] \\&\quad =e^{-\frac{n}{\tau _{k,s}}(t+\sigma _{k,s})}\prod _{i=1}^{n}\left( 1+\sigma _{k,s}/ \tau _{k,s} +\mathbb {E}\left| {X_{i}}_{(k)}{X_{i}}_{(s)}/ \tau _{k,s}\right| ^{\alpha }\right) \\&\quad \le e^{-\frac{n}{\tau _{k,s}}(t+\sigma _{k,s})}\prod _{i=1}^{n}\exp \left( \sigma _{k,s}/ \tau _{k,s}+\mathbb {E}\left| {X_{i}}_{(k)}{X_{i}}_{(s)}/ \tau _{k,s}\right| ^{\alpha }\right) \\&\quad =\exp \left( n\mathbb {E}\left| {X_{i}}_{(k)}{X_{i}}_{(s)}\right| ^{\alpha }/ \tau _{k,s}^{\alpha }-nt/\tau _{k,s}\right) . \end{aligned}$$

Setting \(\tau _{k,s}=\left( \frac{2\mathbb {E}\left| {X_{i}}_{(k)}{X_{i}}_{(s)}\right| ^{\alpha }}{t}\right) ^{\frac{1}{\alpha -1}}\), it yields

$$\begin{aligned} {\text {P}}\left( {\widehat{\sigma }}_{k,s}-\sigma _{k,s}\ge t\right) \le \exp \left( -n\left( \mathbb {E}\left| {X_{i}}_{(k)}{X_{i}}_{(s)}\right| ^{\alpha }\right) ^{\frac{-1}{\alpha -1}} \left( \frac{t}{2}\right) ^{\frac{\alpha }{\alpha -1}}\right) . \end{aligned}$$

When \(t=2\left( \mathbb {E}\left| {X_{i}}_{(k)}{X_{i}}_{(s)}\right| ^{\alpha }\right) ^{\frac{1}{\alpha }}\left( \frac{2\log d -\log \delta }{n}\right) ^{\frac{\alpha -1}{\alpha }}\), we have

$$\begin{aligned} {\text {P}}\left( {\widehat{\sigma }}_{k,s}-\sigma _{k,s}\ge 2\root \alpha \of {\mathbb {E}\left| {X_{i}}_{(k)}{X_{i}}_{(s)}\right| ^{\alpha }}\left( \frac{2\log d -\log \delta }{n}\right) ^{\frac{\alpha -1}{\alpha }}\right) \le \frac{\delta }{d^{2}}. \end{aligned}$$

Therefore,

$$\begin{aligned} {\text {P}}\left( |{\widehat{\sigma }}_{k,s}-\sigma _{k,s}|\ge 2\root \alpha \of {\mathbb {E}\left| {X_{i}}_{(k)}{X_{i}}_{(s)}\right| ^{\alpha }}\left( \frac{2\log d -\log \delta }{n}\right) ^{\frac{\alpha -1}{\alpha }}\right) \le \frac{2\delta }{d^{2}}. \end{aligned}$$

By the union bound, it yields

$$\begin{aligned} {\text {P}}\left( \left\| {\widehat{\Sigma }}-\Sigma \right\| _{\max }\ge 2M\left( \frac{2\log d -\log \delta }{n}\right) ^{\frac{\alpha -1}{\alpha }}\right) \le (1+d^{-1})\delta . \end{aligned}$$

\(\square \)

1.3 Proof of Lemma 3

Proof

Define \(D_{j}:=I-2 e_{j}e_{j}^{T},\) for \(\forall j \in [d]\). Suppose that \({\widehat{\lambda }} \in \mathbb {R}\) and \({\widehat{v}} \in \mathbb {S}^{d-1}\) are an eigenvalue and the correspondent eigenvector of

$$\begin{aligned} {\widehat{\Sigma }}_{n}(\alpha , \tau )=\frac{1}{n} \sum _{i=1}^{n} \psi _{\tau }\left( \left\| X_{i}^{(\ell )}\right\| _{2}^{2}\right) \frac{X_{i}^{(\ell )} X_{i}^{(\ell )T}}{\left\| X_{i}^{(\ell )}\right\| _{2}^{2}} \end{aligned}$$

such that \({\widehat{\Sigma }}_{n}(\alpha , \tau ) {\hat{v}}={\widehat{\lambda }} {\hat{v}}\). Let \(\Sigma ^{(\ell )}=V^{(\ell )} \Lambda ^{(\ell )} V^{T(\ell )}\) be the eigendecomposition of \(\Sigma ^{(\ell )}\). For ease of notation, we remove the superscript \(\ell \), and define \(Z_{i}=\Lambda ^{-\frac{1}{2}} V^{T} X_{i}\) and \({\widehat{S}}=\frac{1}{n } \sum _{i=1}^{n} \psi _{\tau }\left( \left\| X_{i}\right\| _{2}^{2}\right) \frac{Z_{i} Z_{i}^{T}}{\left\| X_{i}\right\| _{2}^{2}} . \) It yields \({\widehat{\Sigma }}_{n}(\alpha , \tau )=V \Lambda ^{\frac{1}{2}} {\widehat{S}} {\Lambda }^{\frac{1}{2}} {V}^{T}\). We denote the matrix \({\check{\Sigma }}:={V} {\Lambda }^{\frac{1}{2}} {D}_{j} \widehat{{S}} {D}_{j} {\Lambda }^{\frac{1}{2}} {V}^{T}\). Because \(\left\{ X_{i}\right\} _{i=1}^{n}\) are symmetric innovation, we have \({Z}_{i}{\mathop {=}\limits ^{d}}D_{j}{Z}_{i}:={{Z}_{i}}^{*}\), and

$$\begin{aligned} {\check{\Sigma }}&= {V} {\Lambda }^{\frac{1}{2}}\left( \frac{1}{n } \sum _{i=1}^{n} \psi _{\tau }\left( \left\| X_{i}\right\| _{2}^{2}\right) \frac{{Z_{i}}^{*} {{Z_{i}}^{*}}^{T}}{\left\| X_{i}\right\| _{2}^{2}}\right) {\Lambda }^{\frac{1}{2}} {V}^{T} \\&= \frac{1}{n} \sum _{i=1}^{n} \psi _{\tau }\left( \left\| {V} {\Lambda }^{\frac{1}{2}}Z_{i}\right\| _{2}^{2}\right) \frac{{V} {\Lambda }^{\frac{1}{2}}{Z_{i}}^{*} {{Z_{i}}^{*}}^{T} {\Lambda }^{\frac{1}{2}} {V}^{T}}{\left\| {V} {\Lambda }^{\frac{1}{2}}Z_{i}\right\| _{2}^{2}} . \end{aligned}$$

Note that \(\left\| {V} {\Lambda }^{\frac{1}{2}}Z_{i}\right\| _{2}^{2}=\left\| {V} {\Lambda }^{\frac{1}{2}}{Z_{i}}^{*}\right\| _{2}^{2}.\) Hence, we have

$$\begin{aligned} {\check{\Sigma }}= & {} \frac{1}{n} \sum _{i=1}^{n} \psi _{\tau }\left( \left\| {V} {\Lambda }^{\frac{1}{2}}{Z_{i}}^{*}\right\| _{2}^{2}\right) \frac{{V} {\Lambda }^{\frac{1}{2}}{Z_{i}}^{*} {{Z_{i}}^{*}}^{T} {\Lambda }^{\frac{1}{2}} {V}^{T}}{\left\| {V} {\Lambda }^{\frac{1}{2}}{Z_{i}}^{*}\right\| _{2}^{2}},\\ {\widehat{\Sigma }}_{n}(\alpha ,\tau )= & {} \frac{1}{n} \sum _{i=1}^{n} \psi _{\tau }\left( \left\| X_{i}\right\| _{2}^{2}\right) \frac{X_{i} X_{i}^{T}}{\left\| X_{i}\right\| _{2}^{2}}\\= & {} \frac{1}{n} \sum _{i=1}^{n} \psi _{\tau }\left( \left\| {V} {\Lambda }^{\frac{1}{2}}Z_{i}\right\| _{2}^{2}\right) \frac{{V} {\Lambda }^{\frac{1}{2}}Z_{i} {Z_{i}}^{T} {\Lambda }^{\frac{1}{2}} {V}^{T}}{\left\| {V} {\Lambda }^{\frac{1}{2}}Z_{i}\right\| _{2}^{2}}. \end{aligned}$$

Therefore, we get that \({\widehat{\Sigma }}_{n}(\alpha ,\tau )\) and \({\check{\Sigma }}\) are identically distributed. The rest of the proof is the same as that of Theorem 2 in Fan et al. (2019a). \(\square \)

1.4 Proof of Theorem 2

Proof

By Lemma 2 and \(x<e^{x}\), we can get that for \(\tau =O\left( \sigma \cdot \sqrt{n}\right) \),

$$\begin{aligned} {\text {P}}\left( \left\| {\widehat{\Sigma }}_{n}(2,\tau )-\Sigma \right\| _{2} \ge t\right) \le C_{1} {\bar{d}}\left( 1+\frac{2\sigma }{t\sqrt{n}} +\frac{2\sigma ^{2}}{t^{2}n}\right) \exp \left( -\frac{t\sqrt{n}}{\sigma }\right) . \end{aligned}$$

By the equivalent definition of sub-exponential random variable and \(\psi _{1}\)-norm,

$$\begin{aligned}&\mathbb {E}\left\| {\widehat{\Sigma }}_{n}(2,\tau )-\Sigma \right\| _{2}^{k}\\&\quad = \int _{0}^{\infty } {\text {P}}\left( \left\| {\widehat{\Sigma }}_{n}(2,\tau )-\Sigma \right\| _{2}^{k}>s\right) \mathrm {d} s \\&\quad = \int _{0}^{\infty } {\text {P}}\left( \left\| {\widehat{\Sigma }}_{n}(2,\tau )-\Sigma \right\| _{2}>s^{1 / k}\right) \mathrm {d} s \\&\quad \le \int _{0}^{\infty } C_{1} {\bar{d}}\left( 1+\frac{2\sigma }{s^{1/k}\sqrt{n}} +\frac{2\sigma ^{2}}{s^{2/k}n}\right) \exp \left( -\frac{s^{1/k}\sqrt{n}}{\sigma }\right) \mathrm {d}s \\&\quad = C_{1}{\bar{d}} \left( \frac{\sigma }{\sqrt{n}}\right) ^{k} k \int _{0}^{\infty } e^{-u}\left( u^{k-1}+ 2 u^{k-2} + 2u^{k-3}\right) \mathrm {d} u \;\; \;\; \left( u:=\frac{s^{1/k}\sqrt{n}}{\sigma }\right) \\&\quad = C_{1}{\bar{d}}\left( \frac{\sigma }{\sqrt{n}}\right) ^{k} k \left( \Gamma (k)+2\Gamma (k-1)+2\Gamma (k-2)\right) . \end{aligned}$$

Because \(\Gamma (k) \le k^{k}\) and for any \(k \ge 1\), \(k^{1 / k} \le e^{1 / e} \le 2\), we have

$$\begin{aligned} \left( \!C_{1}{\bar{d}}\left( \!\frac{\sigma }{\sqrt{n}}\!\right) ^{k} k \left( \!\Gamma (k)+2\Gamma (k-1)+2\Gamma (k-2)\!\right) \!\right) ^{1 / k} \le C\left( {\bar{d}}\right) ^{1/k}{\frac{\sigma }{\sqrt{n}}} k \le C{\bar{d}}{\frac{\sigma }{\sqrt{n}}} k. \end{aligned}$$

Hence, \(\left\| \left\| {\widehat{\Sigma }}_{n}(2,\tau )-\Sigma \right\| _{2}\right\| _{\psi _{1}}=\sup _{k \ge 1}\left( \mathbb {E}\left\| {\widehat{\Sigma }}_{n}(2,\tau )-\Sigma \right\| _{2}^{k}\right) ^{1/k}/k\le C{\bar{d}}{\frac{\sigma }{\sqrt{n}}}.\)

By the Davis-Kahan theorem Yu et al. (2014),

$$\begin{aligned} \left\| \rho \left( \widetilde{{V}}_{K}, {V}_{K}\right) \right\| _{\psi _{1}}\lesssim & {} \left\| \left\| \widetilde{{\Sigma }}-{V}_{K} {V}_{K}^{T}\right\| _{F}\right\| _{\psi _{1}}\nonumber \\\le & {} \left\| \left\| \widetilde{{\Sigma }}-{\Sigma }^{*}\right\| _{F}\right\| _{\psi _{1}}+\left\| {\Sigma }^{*}-{V}_{K} {V}_{K}^{T}\right\| _{F}. \end{aligned}$$
(3)

By the robust covariance version of Lemma 1 and Theorem 2 in Fan et al. (2019a), if for all \(\ell \in [m]\), \(\Vert \mathbb {E}\widehat{{V}}_{K}^{(\ell )} \widehat{{V}}_{K}^{(\ell ) T}-{V}_{K} {V}_{K}^{T}\Vert _{2} \le 1 / 4\), the first term in (3) can be written as

$$\begin{aligned}&\left\| \left\| \widetilde{{\Sigma }}-{\Sigma }^{*}\right\| _{F}\right\| _{\psi _{1}} \lesssim \frac{1}{m} \sqrt{\sum _{\ell =1}^{m}\left\| \left\| \widehat{{V}}_{K}^{(\ell )} \widehat{{V}}_{K}^{(\ell ) T}-\mathbb {E}\widehat{{V}}_{K}^{(\ell )} \widehat{{V}}_{K}^{(\ell ) T}\right\| _{F}\right\| _{\psi _{1}}^{2}}\nonumber \\&\quad \lesssim \frac{1}{m} \sqrt{\sum _{\ell =1}^{m}\left( {\bar{d}}_{(\ell )}\frac{\sqrt{K}}{\Delta _{(\ell )}} {\frac{\sigma _{(\ell )}}{\sqrt{n}}}\right) ^{2}}\lesssim \sqrt{\frac{1}{m}\sum _{\ell =1}^{m}\left( {\bar{d}}_{(\ell )} \frac{\sigma _{(\ell )}}{\Delta _{(\ell )}}\right) ^{2}} \sqrt{\frac{K}{N}}. \end{aligned}$$
(4)

Since

$$\begin{aligned}&\left\| \mathbb {E}\widehat{{V}}_{K}^{(\ell )} \widehat{{V}}_{K}^{(\ell ) T}-V_{K} V_{K}^{T}\right\| _{2}\le \mathbb {E}\left\| \widehat{{V}}_{K}^{(\ell )} \widehat{{V}}_{K}^{(\ell ) T}-V_{K} V_{K}^{T}\right\| _{2}\nonumber \\&\quad \le \mathbb {E}\left\| \widehat{{V}}_{K}^{(\ell )} \widehat{{V}}_{K}^{(\ell ) T}-V_{K} V_{K}^{T}\right\| _{F} \nonumber \\&\quad \le \frac{\sqrt{K}}{\Delta _{(\ell )}}\mathbb {E}\left\| {\widehat{\Sigma }}^{(\ell )}(2,\tau _{(\ell )})-\Sigma ^{(\ell )}\right\| _{2} \le \frac{\sqrt{K}}{\Delta _{(\ell )}}\left\| \left\| {\widehat{\Sigma }}^{(\ell )}(2,\tau _{(\ell )})-\Sigma ^{(\ell )}\right\| _{2}\right\| _{\psi _{1}}\nonumber \\&\quad \lesssim {\bar{d}}_{(\ell )}\frac{\sqrt{K}}{\Delta _{(\ell )}}\frac{\sigma _{(\ell )}}{\sqrt{n}}, \end{aligned}$$
(5)

we obtain that if \(C_{1}\) is sufficiently large such that \(n \ge C_{1}K \max _{\ell \in [m]}\left( {\bar{d}}_{(\ell )}\frac{\sigma _{(\ell )}}{\Delta _{(\ell )}}\right) ^2\), (5) implies that \(\Vert \mathbb {E}\widehat{{V}}_{K}^{(\ell )} \widehat{{V}}_{K}^{(\ell ) T}-{V}_{K} {V}_{K}^{T}\Vert _{2} \le 1 / 4< 1/2\) for all \(\ell \in [m]\). Therefore, by (4) and Lemma 3, we have for some constant \(C_{2}\),

$$\begin{aligned} \left\| \rho \left( \widetilde{{V}}_{K}, {V}_{K}\right) \right\| _{\psi _{1}} \le C_{2}\sqrt{\frac{1}{m}\sum _{\ell =1}^{m}\left( {\bar{d}}_{(\ell )} \frac{\sigma _{(\ell )}}{\Delta _{(\ell )}}\right) ^{2}} \sqrt{\frac{K}{N}}. \end{aligned}$$

\(\square \)

1.5 Proof of Lemma 4

Proof

For \(\forall v \in {\mathcal {S}}^{d-1}\), we have

$$\begin{aligned} \left\| \mathbb {E}|X_{i} X_{i}^{T}|^{2}\right\| _{2}= & {} \mathbb {E}\left( \left\| X_{i}\right\| _{2}^{2}(v^{T}X_{i})^2\right) =\sum _{j=1}^{d}\mathbb {E}\left( x_{ij}^{2}(v^{T}X_{i})^2\right) \\\le & {} \sum _{j=1}^{d}\left( \mathbb {E}|x_{ij}|^{3}\right) ^{\frac{2}{3}}\left( \mathbb {E}(v^{T}X_{i})^6\right) ^{\frac{1}{3}} \le \sum _{j=1}^{d}\left( \mathbb {E}x_{ij}^{6}\right) ^{\frac{1}{3}}\left( \mathbb {E}(v^{T}X_{i})^6\right) ^{\frac{1}{3}}\\\le & {} d R_{(1)}^{\prime \frac{2}{3}}<\infty . \end{aligned}$$

Therefore, define \({\Omega }={\widehat{\Sigma }}_{n}^{(1)}(2, \tau _{(1)})-{\Sigma }^{(1)}\), \({\Gamma }={V}_{K} {V}_{K}^{T}\), \(\widehat{{\Gamma }}=\widehat{{V}}_{K}^{(1)} \widehat{{V}}_{K}^{(1) T}\), \({\Theta }=f\left( {\Omega V}_{K}\right) {V}_{K}^{T}+{V}_{K} f\left( {\Omega V}_{K}\right) ^{T}\) where f is a linear function defined in Lemma 2 of Fan et al. (2019a), \({\Phi } =\widehat{{\Gamma }}-{\Gamma }-{\Theta }\) and \(\omega =\Vert {\Omega }\Vert _{2} / \Delta \). Since

$$\begin{aligned} \widehat{{\Gamma }}-{\Gamma }={\Phi } 1_{\{\omega \le 1 / 10\}}+(\widehat{{\Gamma }}-{\Gamma }) 1_{\{\omega>1 / 10\}}-{\Theta } 1_{\{\omega >1 / 10\}}+\Theta , \end{aligned}$$

we have

$$\begin{aligned} \Vert \mathbb {E} \widehat{{\Gamma }}-{\Gamma }\Vert _{F}\le & {} \mathbb {E}\left( \Vert {\Phi }\Vert _{F} 1_{\{\omega \le 1 / 10\}}\right) +\mathbb {E}\left( \Vert \widehat{{\Gamma }}-{\Gamma }\Vert _{F} 1_{\{\omega>1 / 10\}}\right) \nonumber \\&+\mathbb {E}\left( \Vert {\Theta }\Vert _{F} 1_{\{\omega > 1 / 10\}}\right) +\Vert \mathbb {E}\left( \Theta \right) \Vert _{F}. \end{aligned}$$
(6)

By Theorem 3 in Fan et al. (2019a), it yields

$$\begin{aligned} \mathbb {E}\left( \Vert {\Phi }\Vert _{F} 1_{\{\omega \le 1 / 10\}}\right) +\mathbb {E}\left( \Vert \widehat{{\Gamma }}-{\Gamma }\Vert _{F} 1_{\{\omega>1 / 10\}}\right) +\mathbb {E}\left( \Vert {\Theta }\Vert _{F} 1_{\{\omega > 1 / 10\}}\right) \lesssim \sqrt{K} \mathbb {E}\omega ^2. \end{aligned}$$
(7)

Since \(\mathbb {E}(\Omega )=\mathbb {E}\Big (\psi _{\tau _{(1)}}\left( \left\| X_{i}\right\| _{2}^{2}\right) \frac{X_{i} X_{i}^{T}}{\left\| X_{i}\right\| _{2}^{2}}-X_{i} X_{i}^{T}\Big )=\mathbb {E}\big ((\psi _{\tau _{(1)}}(\Vert X_{i}\Vert _{2}^{2}) /\Vert X_{i}\Vert _{2}^{2}-1) X_{i} X_{i}^{T}\big )\), for \(\forall v \in {\mathcal {S}}^{d-1}\), we have

$$\begin{aligned}&\mathbb {E}\left( \left( \psi _{\tau _{(1)}}\left( \left\| X_{i}\right\| _{2}^{2}\right) /\left\| X_{i}\right\| _{2}^{2}-1\right) v^{T}X_{i} X_{i}^{T}v\right) \\&\quad =\mathbb {E}\left( \frac{\tau _{(1)}-\left\| X_{i}\right\| _{2}^{2}}{\left\| X_{i}\right\| _{2}^{2}}(v^{T}X_{i})^2 1_{\{\left\| X_{i}\right\| _{2}^{2}>\tau _{(1)}\}}\right) \\&\quad \le \mathbb {E}\left( (v^{T}X_{i})^2 1_{\{\left\| X_{i}\right\| _{2}^{2}>\tau _{(1)}\}}\right) \le \left( \mathbb {E}(v^{T}X_{i})^6\right) ^{\frac{1}{3}}\left( \mathrm {P}\left( \left\| X_{i} \right\| _{2}^{2}>\tau _{(1)}\right) \right) ^{\frac{2}{3}}\\&\quad \le R_{(1)}^{\prime \frac{1}{3}}\left( \mathbb {E}\left\| X_{i}\right\| _{2}^{6}/\tau _{(1)}^3\right) ^{\frac{2}{3}} \lesssim R_{(1)}^{\prime }d^{2}/(\sigma _{(1)}^2 n) \end{aligned}$$

where the second and third inequalities follow from Hölder and Markov inequality. The last inequality follows from \(C_{r}\) inequality. Hence, \(\Vert \mathbb {E}\left( \Omega \right) \Vert _{2}\lesssim R_{(1)}^{\prime }d^{2}/\left( \sigma _{(1)}^2 n\right) \) and

$$\begin{aligned} \Vert \mathbb {E}\left( \Theta \right) \Vert _{F}\lesssim \left\| f\left( {\mathbb {E}\left( \Omega \right) V}_{K}\right) \right\| _{F}\le \sqrt{K}\Vert \mathbb {E}\left( \Omega \right) \Vert _{2} / \Delta _{(1)} \lesssim \frac{\sqrt{K}}{\Delta _{(1)}}\frac{R_{(1)}^{\prime }d^{2}}{\sigma _{(1)}^2 n}. \end{aligned}$$
(8)

Finally, combing (6)–(8), it can be shown that

$$\begin{aligned} \Vert \mathbb {E} \widehat{{\Gamma }}-{\Gamma }\Vert _{F} \lesssim&\sqrt{K} \mathbb {E} \omega ^{2}+\Vert \mathbb {E}\left( \Theta \right) \Vert _{F}\lesssim \sqrt{K} \Delta ^{-2}\Vert \Vert {\Omega }\left\| _{2}\right\| _{\psi _{1}}^{2}+\frac{\sqrt{K}}{\Delta _{(1)}}\frac{R_{(1)}^{\prime }d^{2}}{\sigma _{(1)}^2 n} \\ \lesssim&\left( {\bar{d}}_{(1)}\frac{\sigma _{(1)}}{\Delta _{(1)}}\right) ^2\frac{\sqrt{K}}{n}+ \frac{R_{(1)}^{\prime }d^{2}}{\sigma _{(1)}^2\Delta _{(1)}}\frac{\sqrt{K}}{n}. \end{aligned}$$

\(\square \)

1.6 Proof of Theorem 3

Proof

By Lemma 4 and (4), we obtain that when \(n \ge C_{2}K \max _{\ell \in [m]}\left( {\bar{d}}_{(\ell )}\frac{\sigma _{(\ell )}}{\Delta _{(\ell )}}\right) ^2\),

$$\begin{aligned} \begin{aligned}&\left\| \rho \left( \widetilde{{V}}_{K}, {V}_{K}\right) \right\| _{\psi _{1}}\le \left\| \rho \left( \widetilde{{V}}_{K}, {V}_{K}^{\star }\right) \right\| _{\psi _{1}}+\rho \left( {V}_{K}^{\star }, {V}_{K}\right) \\&\lesssim \sqrt{\frac{1}{m}\sum _{\ell =1}^{m}\left( {\bar{d}}_{(\ell )} \frac{\sigma _{(\ell )}}{\Delta _{(\ell )}}\right) ^{2}} \sqrt{\frac{K}{N}}+ \frac{1}{m}\sum _{\ell =1}^{m}\left( \left( {\bar{d}}_{(\ell )}\frac{\sigma _{(\ell )}}{\Delta _{(\ell )}}\right) ^{2} +\frac{R_{(\ell )}^{\prime }d^{2}}{\sigma _{(\ell )}^2\Delta _{(\ell )}}\right) \frac{\sqrt{K}}{n}. \end{aligned} \end{aligned}$$

Therefore, when the requirement on m and n is satisfied, we have for a constant \(C_{3}\),

$$\begin{aligned} \left\| \rho \left( \widetilde{{V}}_{K}, {V}_{K}\right) \right\| _{\psi _{1}}\le C_{3}\sqrt{\frac{1}{m}\sum _{\ell =1}^{m}\left( {\bar{d}}_{(\ell )} \frac{\sigma _{(\ell )}}{\Delta _{(\ell )}}\right) ^{2}} \sqrt{\frac{K}{N}}. \end{aligned}$$

\(\square \)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Li, K., Bao, H. & Zhang, L. Robust covariance estimation for distributed principal component analysis. Metrika 85, 707–732 (2022). https://doi.org/10.1007/s00184-021-00848-9

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00184-021-00848-9

Keywords

Navigation