Robust covariance estimation for distributed principal component analysis

Li, Kangqiang; Bao, Han; Zhang, Lixin

doi:10.1007/s00184-021-00848-9

Robust covariance estimation for distributed principal component analysis

Published: 22 November 2021

Volume 85, pages 707–732, (2022)
Cite this article

Metrika Aims and scope Submit manuscript

772 Accesses
1 Citation
1 Altmetric
Explore all metrics

Abstract

Fan et al. (Ann Stat 47(6):3009–3031, 2019) constructed a distributed principal component analysis (PCA) algorithm to reduce the communication cost between multiple servers significantly. However, their algorithm’s guarantee is only for sub-Gaussian data. Spurred by this deficiency, this paper enhances the effectiveness of their distributed PCA algorithm by utilizing robust covariance matrix estimators of Minsker (Ann Stat 46(6A):2871–2903, 2018) and Ke et al. (Stat Sci 34(3):454–471, 2019) to tame heavy-tailed data. The theoretical results demonstrate that when the sampling distribution is symmetric innovation with the bounded fourth moment or asymmetric with the finite 6th moment, the statistical error rate of the final estimator produced by the robust algorithm is similar to that of sub-Gaussian tails. Extensive numerical trials support the theoretical analysis and indicate that our algorithm is robust to heavy-tailed data and outliers.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Variance Variation Criterion and Consistency in Estimating the Number of Significant Signals of High-dimensional PCA

Article 01 July 2022

Guan-peng Wang & Heng-jian Cui

Generalized spherical principal component analysis

Article 23 March 2024

Sarah Leyder, Jakob Raymaekers & Tim Verdonck

ECOPICA: empirical copula-based independent component analysis

Article 13 December 2023

Hung-Kai Pi, Mei-Hui Guo, … Shih-Feng Huang

Notes

The original data consists of 24017 instances and 2400 features. We employ the first 1000 features of each instance as the sample.

References

Anderson TW (1963) Asymptotic theory for principal component analysis. Ann Math Stat 34(1):122–148
Article MathSciNet Google Scholar
Avella-Medina M, Battey HS, Fan J, Li Q (2018) Robust estimation of high-dimensional covariance and precision matrices. Biometrika 105(2):271–284
Article MathSciNet Google Scholar
Bhaskara A, Wijewardena PM (2019) On distributed averaging for stochastic $k$-PCA. In: Advances in neural information processing systems, pp 11024–11033
Bickel PJ, Levina E (2008) Covariance regularization by thresholding. Ann Stat 36:2577–2604
MathSciNet MATH Google Scholar
Catoni O (2012) Challenging the empirical mean and empirical variance: a deviation study. Annales de l’Institut Henri Poincaré, Probabilités et Statistiques 48(4):1148–1185
Catoni O (2016) PAC-Bayesian bounds for the Gram matrix and least squares regression with a random design. arXiv:1603.05229
Chen TL, Chang DD, Huang S-Y, Chen H, Lin C, Wang W (2016) Integrating multiple random sketches for singular value decomposition. arXiv:1608.08285
Chen X, Lee JD, Li H, Yang Y (2021) Distributed estimation for principal component analysis: an enlarged Eigenspace analysis. J Am Stat Assoc 47:1–31
Google Scholar
Davis AW (1977) Asymptotic theory for principal component analysis: non-normal case. Aust J Stat 19(3):206–212
Article MathSciNet Google Scholar
Dua D, Graff C (2017) UCI machine learning repository. http://archive.ics.uci.edu/ml
El Karoui N, d’Aspremont A (2010) Second order accurate distributed eigenvector computation for extremely large matrices. Electron J Stat 4:1345–1385
Fan J, Fan Y, Lv J (2008) High dimensional covariance matrix estimation using a factor model. J Econom 147:186–197
Article MathSciNet Google Scholar
Fan J, Liu H, Wang W (2018) Large covariance estimation through elliptical factor models. Ann Stat 46(4):1383
Article MathSciNet Google Scholar
Fan J, Wang D, Wang K, Zhu Z (2019a) Distributed estimation of principal eigenspaces. Ann Stat 47(6):3009–3031
Fan J, Wang W, Zhong Y (2019b) Robust covariance estimation for approximate factor models. J Econom 208(1):5–22
Fan J, Guo Y, Wang K (2021a) Communication-efficient accurate statistical estimation. J Am Stat Assoc (to appear)
Fan J, Wang W, Zhu Z (2021b) A shrinkage principle for heavy-tailed data: high-dimensional robust low-rank matrix recovery. Ann Stat 49(3):1239–1266
Han F, Liu H (2018) ECA: high-dimensional elliptical component analysis in non-Gaussian distributions. J Am Stat Assoc 113(521):252–268
Article MathSciNet Google Scholar
Huber PJ (1964) Robust estimation of a location parameter. Ann Math Stat 35:73–101
Article MathSciNet Google Scholar
Janzamin M, Sedghi H, Anandkumar A (2014) Score function features for discriminative learning: matrix and tensor framework. arXiv:1412.2863
Jordan MI, Lee JD, Yang Y (2018) Communication-efficient distributed statistical inference. J Am Stat Assoc 114(526):668–681
Article MathSciNet Google Scholar
Ke Y, Minsker S, Ren Z, Sun Q, Zhou W-X (2019) User-friendly covariance estimation for heavy-tailed distributions. Stat Sci 34(3):454–471
Article MathSciNet Google Scholar
Lee JD, Liu Q, Sun Y, Taylor JE (2017) Communication-efficient sparse regression. J Mach Learn Res 18:1–30
MathSciNet MATH Google Scholar
Mendelson S, Zhivotovskiy N (2018) Robust covariance estimation under $L_{4}-L_{2}$ norm equivalence. Ann Stat 48(3):1648–1664
MATH Google Scholar
Minsker S (2018) Sub-gaussian estimators of the mean of a random matrix with heavy-tailed entries. Ann Stat 46(6A):2871–2903
Article MathSciNet Google Scholar
Minsker S, Wei X (2017) Estimation of the covariance structure of heavy-tailed distributions. In: Advances in neural information processing systems, pp 2855–2864
Minsker S, Wei X (2020) Robust modifications of U-statistics and applications to covariance estimation problems. Bernoulli 26(1):694–727
Article MathSciNet Google Scholar
Pearson K (1901) On lines and planes of closest fit to systems of points in space. Lond Edinb Dublin Philos Mag J Sci 2(11):559–572
Article Google Scholar
Schizas ID, Aduroja A (2015) A distributed framework for dimensionality reduction and denoising. IEEE Trans Signal Process 63(23):6379–6394
Article MathSciNet Google Scholar
Tian L, Gu Q (2017) Communication-efficient distributed sparse linear discriminant analysis. In: Artificial intelligence and statistics, pp 1178–1187
Wang W, Fan J (2017) Asymptotics of empirical eigenstructure for high dimensional spiked covariance. Ann Stat 45(3):1342–1374
Article MathSciNet Google Scholar
Yang Z, Balasubramanian K, Liu H (2017) High-dimensional non-Gaussian single index models via thresholded score function estimation. In: International conference on machine learning, pp 3851–3860
Yu Y, Wang T, Samworth RJ (2014) A useful variant of the Davis–Kahan theorem for statisticians. Biometrika 102(2):315–323
Article MathSciNet Google Scholar

Download references

Author information

Authors and Affiliations

School of Mathematical Sciences, Zhejiang University, Hangzhou, 310027, Zhejiang, China
Kangqiang Li, Han Bao & Lixin Zhang

Authors

Kangqiang Li
View author publications
You can also search for this author in PubMed Google Scholar
Han Bao
View author publications
You can also search for this author in PubMed Google Scholar
Lixin Zhang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Kangqiang Li.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

This work was supported by grants from the NSF of China (Grant No. 11731012), Ten Thousands Talents Plan of Zhejiang Province (Grant No. 2018R52042) and the Fundamental Research Funds for the Central Universities.

Appendix

1.1 Proof of Lemma 1

Proof

Instead of $\max (e^x-1,0)$ in the proof of Theorem 3.2 in Minsker (2018), we define $\phi (x):= e^{x}-x-1$. For $t \ge 0,$

$$\begin{aligned} {\text {P}}\left( \lambda _{\max }\left( n{\widehat{\Sigma }}_{n}(\alpha ,\theta )-n\Sigma \right) \ge nt\right)= & {} {\text {P}}\left( \phi \left( \lambda _{\max } \left( n\theta {\widehat{\Sigma }}_{n}(\alpha ,\theta )-n\theta \Sigma \right) \ge \phi (n\theta t)\right) \right) \\\le & {} \frac{1}{\phi (n\theta t)}\mathbb {E} {\text {tr}}\left[ \phi \left( n\theta {\widehat{\Sigma }}_{n}(\alpha ,\theta )-n\theta \Sigma \right) \right] . \end{aligned}$$

By following the proof of Lemma 3.1 in Minsker (2018). we can obtain

$$\begin{aligned} \mathbb {E} {\text {tr}}\left[ \exp \left( n\theta {\widehat{\Sigma }}_{n}(\alpha ,\theta )-n\theta \Sigma \right) \right] \le {\text {tr}} \left[ \exp \left( \sum _{i=1}^{n}c_{\alpha }\theta ^{\alpha } \mathbb {E} |X_{i}X_{i}^{T}|^{\alpha } \right) \right] . \end{aligned}$$

Due to

$$\begin{aligned} -\log \left( I-\theta X_{i} {X_{i}}^{T}+c_{\alpha }\theta ^{\alpha } {|X_{i} {X_{i}}^{T}|}^{\alpha }\right)\preceq & {} \psi _{\alpha }\left( \theta X_{i} {X_{i}}^{T}\right) \\\preceq & {} \log \left( I+\theta X_{i} {X_{i}}^{T}+c_{\alpha }\theta ^{\alpha } {|X_{i} {X_{i}}^{T}|}^{\alpha }\right) \end{aligned}$$

and $\log (1+x)\le x$, it yields

$$\begin{aligned}&-\mathbb {E}{\text {tr}}\left[ \left( n\theta {\widehat{\Sigma }}_{n}(\alpha ,\theta )-n\theta \Sigma \right) \right] \\&\quad \le \mathbb {E}{\text {tr}}\left[ \sum _{i=1}^{n}\log \left( I-\theta X_{i} {X_{i}}^{T}+c_{\alpha }\theta ^{\alpha } {|X_{i} {X_{i}}^{T}|}^{\alpha }\right) +n\theta \Sigma \right] \\&\quad \le \mathbb {E}{\text {tr}}\left[ \sum _{i=1}^{n}\left( -\theta X_{i} {X_{i}}^{T}+c_{\alpha }\theta ^{\alpha } {|X_{i} {X_{i}}^{T}|}^{\alpha }\right) +n\theta \Sigma \right] ={\text {tr}} \left[ \sum _{i=1}^{n}c_{\alpha }\theta ^{\alpha } \mathbb {E} |X_{i}X_{i}^{T}|^{\alpha }\right] . \end{aligned}$$

Therefore,

$$\begin{aligned}&\mathbb {E} {\text {tr}}\left[ \phi \left( n\theta {\widehat{\Sigma }}_{n}(\alpha ,\theta )-n\theta \Sigma \right) \right] \\&\quad \le {\text {tr}} \left[ \exp \left( \sum _{i=1}^{n}c_{\alpha }\theta ^{\alpha } \mathbb {E} |X_{i}X_{i}^{T}|^{\alpha } \right) +\sum _{i=1}^{n}c_{\alpha }\theta ^{\alpha } \mathbb {E} |X_{i}X_{i}^{T}|^{\alpha }-I\right] . \end{aligned}$$

Because $\frac{e^{x}-x-1}{x}=\sum _{i=1}^{\infty } \frac{x^{i}}{(i +1)!}$, we have

$$\begin{aligned}&{\text {tr}} \left[ \exp \left( \sum _{i=1}^{n}c_{\alpha }\theta ^{\alpha } \mathbb {E} |X_{i}X_{i}^{T}|^{\alpha } \right) +\sum _{i=1}^{n}c_{\alpha }\theta ^{\alpha } \mathbb {E} |X_{i}X_{i}^{T}|^{\alpha }-I\right] \\&\quad ={\text {tr}}\left[ c_{\alpha }\theta ^{\alpha } \sqrt{n\mathbb {E} |X_{i}X_{i}^{T}|^{\alpha }}\left( 2I+\sum _{k=2}^{\infty }{\left( c_{\alpha }\theta ^{\alpha } n\mathbb {E} |X_{i}X_{i}^{T}|^{\alpha }\right) ^{k-1}}/{k !}\right) \sqrt{n\mathbb {E} |X_{i}X_{i}^{T}|^{\alpha }}\right] \\&\quad \le {\text {tr}}\left[ c_{\alpha }\theta ^{\alpha }n\mathbb {E} |X_{i}X_{i}^{T}|^{\alpha }\left( 2+\sum _{k=2}^{\infty }{\left( c_{\alpha }\theta ^{\alpha }n\left\| \mathbb {E} |X_{i}X_{i}^{T}|^{\alpha }\right\| _2\right) ^{k-1}}/{k !}\right) \right] \\&\quad = {\bar{d}}_{\alpha }\left( \exp \left( c_{\alpha }\theta ^{\alpha }n\left\| \mathbb {E} |X_{i}X_{i}^{T}|^{\alpha }\right\| _{2}\right) +c_{\alpha }\theta ^{\alpha }n\left\| \mathbb {E} |X_{i}X_{i}^{T}|^{\alpha }\right\| _{2}-1\right) . \end{aligned}$$

By $\frac{e^{x}}{e^{x}-x-1} \le 1+\frac{2}{x}+\frac{2}{x^{2}}$ for $x>0$, it yields

$$\begin{aligned}&\left( \exp \left( c_{\alpha }n\theta ^{\alpha }\left\| \mathbb {E} |X_{i}X_{i}^{T}|^{\alpha }\right\| _{2}\right) +c_{\alpha }n\theta ^{\alpha }\left\| \mathbb {E} |X_{i}X_{i}^{T}|^{\alpha }\right\| _{2}-1\right) /\phi (n\theta t)\\&\quad \le \left( \exp \left( c_{\alpha }n\theta ^{\alpha }\left\| \mathbb {E} |X_{i}X_{i}^{T}|^{\alpha }\right\| _{2}-\theta nt\right) +c_{\alpha }n\theta ^{\alpha }\left\| \mathbb {E} |X_{i}X_{i}^{T}|^{\alpha }\right\| _{2}e^{-\theta nt}\right) \\&\qquad \frac{e^{n\theta t}}{e^{n\theta t}-n\theta t-1}\\&\quad \le \left( \exp \left( c_{\alpha }n\theta ^{\alpha }\left\| \mathbb {E} |X_{i}X_{i}^{T}|^{\alpha }\right\| _{2}-n\theta t\right) +c_{\alpha }n\theta ^{\alpha }\left\| \mathbb {E} |X_{i}X_{i}^{T}|^{\alpha }\right\| _{2}e^{-\theta nt}\right) \\&\qquad \left( 1+\frac{2}{n\theta t}+\frac{2}{(n\theta t)^{2}}\right) . \end{aligned}$$

Therefore,

$$\begin{aligned}&{\text {P}}\left( \lambda _{\max }\left( {\widehat{\Sigma }}_{n}(\alpha ,\theta )-\Sigma \right) \ge t\right) \\&\quad \le {\bar{d}}_{\alpha }e^{-\theta nt}\left( e^{c_{\alpha }n \theta ^{\alpha }v^{\alpha }}+c_{\alpha }n \theta ^{\alpha }v^{\alpha }\right) \left( 1+\frac{2}{\theta nt}+\frac{2}{(\theta nt)^2}\right) . \end{aligned}$$

By the same way, we have

$$\begin{aligned}&{\text {P}}\left( \lambda _{\min }\left( {\widehat{\Sigma }}_{n}(\alpha ,\theta )-\Sigma \right) \right. \\&\quad \left. \le -t\right) \le {\bar{d}}_{\alpha }e^{-\theta nt}\left( e^{c_{\alpha }n \theta ^{\alpha }v^{\alpha }}+c_{\alpha }n \theta ^{\alpha }v^{\alpha }\right) \left( 1+\frac{2}{\theta nt}+\frac{2}{(\theta nt)^2}\right) . \end{aligned}$$

$\square $

1.2 Proof of Theorem 1

Proof

For $\forall k, s \in [d]$ and $t>0$, by $\sigma _{k,s}:=\Sigma _{(k,s)}$, we have

$$\begin{aligned}&{\text {P}}\left( {\widehat{\sigma }}_{k,s}-\sigma _{k,s}\ge t\right) \\&\quad ={\text {P}}\left( \frac{n}{\tau _{k,s}}{\widehat{\sigma }}_{k,s}\ge \frac{n}{\tau _{k,s}}\sigma _{k,s}+\frac{nt}{\tau _{k,s}}\right) \le e^{-\frac{n}{\tau _{k,s}}(t+\sigma _{k,s})}\mathbb {E}\left[ \exp \left( n{\widehat{\sigma }}_{k,s}/\tau _{k,s}\right) \right] \\&\quad =e^{-\frac{n}{\tau _{k,s}}(t+\sigma _{k,s})}\mathbb {E}\left[ \exp \left( \sum _{i=1}^{n} \psi _{\tau _{k,s}}\left( {X_{i}}_{(k)}{X_{i}}_{(s)}\right) / \tau _{k,s}\right) \right] \\&\quad =e^{-\frac{n}{\tau _{k,s}}(t+\sigma _{k,s})}\mathbb {E}\left[ \exp \left( \sum _{i=1}^{n} \psi _{1}\left( {X_{i}}_{(k)}{X_{i}}_{(s)}/ \tau _{k,s}\right) \right) \right] \\&\quad \le e^{-\frac{n}{\tau _{k,s}}(t+\sigma _{k,s})}\mathbb {E}\left[ \prod _{i=1}^{n}\left( 1+{X_{i}}_{(k)}{X_{i}}_{(s)}/ \tau _{k,s}+\left| {X_{i}}_{(k)}{X_{i}}_{(s)}/ \tau _{k,s}\right| ^{\alpha }\right) \right] \\&\quad =e^{-\frac{n}{\tau _{k,s}}(t+\sigma _{k,s})}\prod _{i=1}^{n}\left( 1+\sigma _{k,s}/ \tau _{k,s} +\mathbb {E}\left| {X_{i}}_{(k)}{X_{i}}_{(s)}/ \tau _{k,s}\right| ^{\alpha }\right) \\&\quad \le e^{-\frac{n}{\tau _{k,s}}(t+\sigma _{k,s})}\prod _{i=1}^{n}\exp \left( \sigma _{k,s}/ \tau _{k,s}+\mathbb {E}\left| {X_{i}}_{(k)}{X_{i}}_{(s)}/ \tau _{k,s}\right| ^{\alpha }\right) \\&\quad =\exp \left( n\mathbb {E}\left| {X_{i}}_{(k)}{X_{i}}_{(s)}\right| ^{\alpha }/ \tau _{k,s}^{\alpha }-nt/\tau _{k,s}\right) . \end{aligned}$$

Setting $\tau _{k,s}=\left( \frac{2\mathbb {E}\left| {X_{i}}_{(k)}{X_{i}}_{(s)}\right| ^{\alpha }}{t}\right) ^{\frac{1}{\alpha -1}}$, it yields

$$\begin{aligned} {\text {P}}\left( {\widehat{\sigma }}_{k,s}-\sigma _{k,s}\ge t\right) \le \exp \left( -n\left( \mathbb {E}\left| {X_{i}}_{(k)}{X_{i}}_{(s)}\right| ^{\alpha }\right) ^{\frac{-1}{\alpha -1}} \left( \frac{t}{2}\right) ^{\frac{\alpha }{\alpha -1}}\right) . \end{aligned}$$

When $t=2\left( \mathbb {E}\left| {X_{i}}_{(k)}{X_{i}}_{(s)}\right| ^{\alpha }\right) ^{\frac{1}{\alpha }}\left( \frac{2\log d -\log \delta }{n}\right) ^{\frac{\alpha -1}{\alpha }}$, we have

$$\begin{aligned} {\text {P}}\left( {\widehat{\sigma }}_{k,s}-\sigma _{k,s}\ge 2\root \alpha \of {\mathbb {E}\left| {X_{i}}_{(k)}{X_{i}}_{(s)}\right| ^{\alpha }}\left( \frac{2\log d -\log \delta }{n}\right) ^{\frac{\alpha -1}{\alpha }}\right) \le \frac{\delta }{d^{2}}. \end{aligned}$$

Therefore,

$$\begin{aligned} {\text {P}}\left( |{\widehat{\sigma }}_{k,s}-\sigma _{k,s}|\ge 2\root \alpha \of {\mathbb {E}\left| {X_{i}}_{(k)}{X_{i}}_{(s)}\right| ^{\alpha }}\left( \frac{2\log d -\log \delta }{n}\right) ^{\frac{\alpha -1}{\alpha }}\right) \le \frac{2\delta }{d^{2}}. \end{aligned}$$

By the union bound, it yields

$$\begin{aligned} {\text {P}}\left( \left\| {\widehat{\Sigma }}-\Sigma \right\| _{\max }\ge 2M\left( \frac{2\log d -\log \delta }{n}\right) ^{\frac{\alpha -1}{\alpha }}\right) \le (1+d^{-1})\delta . \end{aligned}$$

$\square $

1.3 Proof of Lemma 3

Proof

Define $D_{j}:=I-2 e_{j}e_{j}^{T},$ for $\forall j \in [d]$. Suppose that ${\widehat{\lambda }} \in \mathbb {R}$ and ${\widehat{v}} \in \mathbb {S}^{d-1}$ are an eigenvalue and the correspondent eigenvector of

$$\begin{aligned} {\widehat{\Sigma }}_{n}(\alpha , \tau )=\frac{1}{n} \sum _{i=1}^{n} \psi _{\tau }\left( \left\| X_{i}^{(\ell )}\right\| _{2}^{2}\right) \frac{X_{i}^{(\ell )} X_{i}^{(\ell )T}}{\left\| X_{i}^{(\ell )}\right\| _{2}^{2}} \end{aligned}$$

such that ${\widehat{\Sigma }}_{n}(\alpha , \tau ) {\hat{v}}={\widehat{\lambda }} {\hat{v}}$. Let $\Sigma ^{(\ell )}=V^{(\ell )} \Lambda ^{(\ell )} V^{T(\ell )}$ be the eigendecomposition of $\Sigma ^{(\ell )}$. For ease of notation, we remove the superscript $\ell $, and define $Z_{i}=\Lambda ^{-\frac{1}{2}} V^{T} X_{i}$ and ${\widehat{S}}=\frac{1}{n } \sum _{i=1}^{n} \psi _{\tau }\left( \left\| X_{i}\right\| _{2}^{2}\right) \frac{Z_{i} Z_{i}^{T}}{\left\| X_{i}\right\| _{2}^{2}} . $ It yields ${\widehat{\Sigma }}_{n}(\alpha , \tau )=V \Lambda ^{\frac{1}{2}} {\widehat{S}} {\Lambda }^{\frac{1}{2}} {V}^{T}$. We denote the matrix ${\check{\Sigma }}:={V} {\Lambda }^{\frac{1}{2}} {D}_{j} \widehat{{S}} {D}_{j} {\Lambda }^{\frac{1}{2}} {V}^{T}$. Because $\left\{ X_{i}\right\} _{i=1}^{n}$ are symmetric innovation, we have ${Z}_{i}{\mathop {=}\limits ^{d}}D_{j}{Z}_{i}:={{Z}_{i}}^{*}$, and

$$\begin{aligned} {\check{\Sigma }}&= {V} {\Lambda }^{\frac{1}{2}}\left( \frac{1}{n } \sum _{i=1}^{n} \psi _{\tau }\left( \left\| X_{i}\right\| _{2}^{2}\right) \frac{{Z_{i}}^{*} {{Z_{i}}^{*}}^{T}}{\left\| X_{i}\right\| _{2}^{2}}\right) {\Lambda }^{\frac{1}{2}} {V}^{T} \\&= \frac{1}{n} \sum _{i=1}^{n} \psi _{\tau }\left( \left\| {V} {\Lambda }^{\frac{1}{2}}Z_{i}\right\| _{2}^{2}\right) \frac{{V} {\Lambda }^{\frac{1}{2}}{Z_{i}}^{*} {{Z_{i}}^{*}}^{T} {\Lambda }^{\frac{1}{2}} {V}^{T}}{\left\| {V} {\Lambda }^{\frac{1}{2}}Z_{i}\right\| _{2}^{2}} . \end{aligned}$$

Note that $\left\| {V} {\Lambda }^{\frac{1}{2}}Z_{i}\right\| _{2}^{2}=\left\| {V} {\Lambda }^{\frac{1}{2}}{Z_{i}}^{*}\right\| _{2}^{2}.$ Hence, we have

$$\begin{aligned} {\check{\Sigma }}= & {} \frac{1}{n} \sum _{i=1}^{n} \psi _{\tau }\left( \left\| {V} {\Lambda }^{\frac{1}{2}}{Z_{i}}^{*}\right\| _{2}^{2}\right) \frac{{V} {\Lambda }^{\frac{1}{2}}{Z_{i}}^{*} {{Z_{i}}^{*}}^{T} {\Lambda }^{\frac{1}{2}} {V}^{T}}{\left\| {V} {\Lambda }^{\frac{1}{2}}{Z_{i}}^{*}\right\| _{2}^{2}},\\ {\widehat{\Sigma }}_{n}(\alpha ,\tau )= & {} \frac{1}{n} \sum _{i=1}^{n} \psi _{\tau }\left( \left\| X_{i}\right\| _{2}^{2}\right) \frac{X_{i} X_{i}^{T}}{\left\| X_{i}\right\| _{2}^{2}}\\= & {} \frac{1}{n} \sum _{i=1}^{n} \psi _{\tau }\left( \left\| {V} {\Lambda }^{\frac{1}{2}}Z_{i}\right\| _{2}^{2}\right) \frac{{V} {\Lambda }^{\frac{1}{2}}Z_{i} {Z_{i}}^{T} {\Lambda }^{\frac{1}{2}} {V}^{T}}{\left\| {V} {\Lambda }^{\frac{1}{2}}Z_{i}\right\| _{2}^{2}}. \end{aligned}$$

Therefore, we get that ${\widehat{\Sigma }}_{n}(\alpha ,\tau )$ and ${\check{\Sigma }}$ are identically distributed. The rest of the proof is the same as that of Theorem 2 in Fan et al. (2019a). $\square $

1.4 Proof of Theorem 2

Proof

By Lemma 2 and $x<e^{x}$, we can get that for $\tau =O\left( \sigma \cdot \sqrt{n}\right) $,

$$\begin{aligned} {\text {P}}\left( \left\| {\widehat{\Sigma }}_{n}(2,\tau )-\Sigma \right\| _{2} \ge t\right) \le C_{1} {\bar{d}}\left( 1+\frac{2\sigma }{t\sqrt{n}} +\frac{2\sigma ^{2}}{t^{2}n}\right) \exp \left( -\frac{t\sqrt{n}}{\sigma }\right) . \end{aligned}$$

By the equivalent definition of sub-exponential random variable and $\psi _{1}$-norm,

$$\begin{aligned}&\mathbb {E}\left\| {\widehat{\Sigma }}_{n}(2,\tau )-\Sigma \right\| _{2}^{k}\\&\quad = \int _{0}^{\infty } {\text {P}}\left( \left\| {\widehat{\Sigma }}_{n}(2,\tau )-\Sigma \right\| _{2}^{k}>s\right) \mathrm {d} s \\&\quad = \int _{0}^{\infty } {\text {P}}\left( \left\| {\widehat{\Sigma }}_{n}(2,\tau )-\Sigma \right\| _{2}>s^{1 / k}\right) \mathrm {d} s \\&\quad \le \int _{0}^{\infty } C_{1} {\bar{d}}\left( 1+\frac{2\sigma }{s^{1/k}\sqrt{n}} +\frac{2\sigma ^{2}}{s^{2/k}n}\right) \exp \left( -\frac{s^{1/k}\sqrt{n}}{\sigma }\right) \mathrm {d}s \\&\quad = C_{1}{\bar{d}} \left( \frac{\sigma }{\sqrt{n}}\right) ^{k} k \int _{0}^{\infty } e^{-u}\left( u^{k-1}+ 2 u^{k-2} + 2u^{k-3}\right) \mathrm {d} u \;\; \;\; \left( u:=\frac{s^{1/k}\sqrt{n}}{\sigma }\right) \\&\quad = C_{1}{\bar{d}}\left( \frac{\sigma }{\sqrt{n}}\right) ^{k} k \left( \Gamma (k)+2\Gamma (k-1)+2\Gamma (k-2)\right) . \end{aligned}$$

Because $\Gamma (k) \le k^{k}$ and for any $k \ge 1$, $k^{1 / k} \le e^{1 / e} \le 2$, we have

$$\begin{aligned} \left( \!C_{1}{\bar{d}}\left( \!\frac{\sigma }{\sqrt{n}}\!\right) ^{k} k \left( \!\Gamma (k)+2\Gamma (k-1)+2\Gamma (k-2)\!\right) \!\right) ^{1 / k} \le C\left( {\bar{d}}\right) ^{1/k}{\frac{\sigma }{\sqrt{n}}} k \le C{\bar{d}}{\frac{\sigma }{\sqrt{n}}} k. \end{aligned}$$

Hence, $\left\| \left\| {\widehat{\Sigma }}_{n}(2,\tau )-\Sigma \right\| _{2}\right\| _{\psi _{1}}=\sup _{k \ge 1}\left( \mathbb {E}\left\| {\widehat{\Sigma }}_{n}(2,\tau )-\Sigma \right\| _{2}^{k}\right) ^{1/k}/k\le C{\bar{d}}{\frac{\sigma }{\sqrt{n}}}.$

By the Davis-Kahan theorem Yu et al. (2014),

$$\begin{aligned} \left\| \rho \left( \widetilde{{V}}_{K}, {V}_{K}\right) \right\| _{\psi _{1}}\lesssim & {} \left\| \left\| \widetilde{{\Sigma }}-{V}_{K} {V}_{K}^{T}\right\| _{F}\right\| _{\psi _{1}}\nonumber \\\le & {} \left\| \left\| \widetilde{{\Sigma }}-{\Sigma }^{*}\right\| _{F}\right\| _{\psi _{1}}+\left\| {\Sigma }^{*}-{V}_{K} {V}_{K}^{T}\right\| _{F}. \end{aligned}$$

(3)

By the robust covariance version of Lemma 1 and Theorem 2 in Fan et al. (2019a), if for all $\ell \in [m]$, $\Vert \mathbb {E}\widehat{{V}}_{K}^{(\ell )} \widehat{{V}}_{K}^{(\ell ) T}-{V}_{K} {V}_{K}^{T}\Vert _{2} \le 1 / 4$, the first term in (3) can be written as

$$\begin{aligned}&\left\| \left\| \widetilde{{\Sigma }}-{\Sigma }^{*}\right\| _{F}\right\| _{\psi _{1}} \lesssim \frac{1}{m} \sqrt{\sum _{\ell =1}^{m}\left\| \left\| \widehat{{V}}_{K}^{(\ell )} \widehat{{V}}_{K}^{(\ell ) T}-\mathbb {E}\widehat{{V}}_{K}^{(\ell )} \widehat{{V}}_{K}^{(\ell ) T}\right\| _{F}\right\| _{\psi _{1}}^{2}}\nonumber \\&\quad \lesssim \frac{1}{m} \sqrt{\sum _{\ell =1}^{m}\left( {\bar{d}}_{(\ell )}\frac{\sqrt{K}}{\Delta _{(\ell )}} {\frac{\sigma _{(\ell )}}{\sqrt{n}}}\right) ^{2}}\lesssim \sqrt{\frac{1}{m}\sum _{\ell =1}^{m}\left( {\bar{d}}_{(\ell )} \frac{\sigma _{(\ell )}}{\Delta _{(\ell )}}\right) ^{2}} \sqrt{\frac{K}{N}}. \end{aligned}$$

(4)

Since

$$\begin{aligned}&\left\| \mathbb {E}\widehat{{V}}_{K}^{(\ell )} \widehat{{V}}_{K}^{(\ell ) T}-V_{K} V_{K}^{T}\right\| _{2}\le \mathbb {E}\left\| \widehat{{V}}_{K}^{(\ell )} \widehat{{V}}_{K}^{(\ell ) T}-V_{K} V_{K}^{T}\right\| _{2}\nonumber \\&\quad \le \mathbb {E}\left\| \widehat{{V}}_{K}^{(\ell )} \widehat{{V}}_{K}^{(\ell ) T}-V_{K} V_{K}^{T}\right\| _{F} \nonumber \\&\quad \le \frac{\sqrt{K}}{\Delta _{(\ell )}}\mathbb {E}\left\| {\widehat{\Sigma }}^{(\ell )}(2,\tau _{(\ell )})-\Sigma ^{(\ell )}\right\| _{2} \le \frac{\sqrt{K}}{\Delta _{(\ell )}}\left\| \left\| {\widehat{\Sigma }}^{(\ell )}(2,\tau _{(\ell )})-\Sigma ^{(\ell )}\right\| _{2}\right\| _{\psi _{1}}\nonumber \\&\quad \lesssim {\bar{d}}_{(\ell )}\frac{\sqrt{K}}{\Delta _{(\ell )}}\frac{\sigma _{(\ell )}}{\sqrt{n}}, \end{aligned}$$

(5)

we obtain that if $C_{1}$ is sufficiently large such that $n \ge C_{1}K \max _{\ell \in [m]}\left( {\bar{d}}_{(\ell )}\frac{\sigma _{(\ell )}}{\Delta _{(\ell )}}\right) ^2$, (5) implies that $\Vert \mathbb {E}\widehat{{V}}_{K}^{(\ell )} \widehat{{V}}_{K}^{(\ell ) T}-{V}_{K} {V}_{K}^{T}\Vert _{2} \le 1 / 4< 1/2$ for all $\ell \in [m]$. Therefore, by (4) and Lemma 3, we have for some constant $C_{2}$,

$$\begin{aligned} \left\| \rho \left( \widetilde{{V}}_{K}, {V}_{K}\right) \right\| _{\psi _{1}} \le C_{2}\sqrt{\frac{1}{m}\sum _{\ell =1}^{m}\left( {\bar{d}}_{(\ell )} \frac{\sigma _{(\ell )}}{\Delta _{(\ell )}}\right) ^{2}} \sqrt{\frac{K}{N}}. \end{aligned}$$

$\square $

1.5 Proof of Lemma 4

Proof

For $\forall v \in {\mathcal {S}}^{d-1}$, we have

$$\begin{aligned} \left\| \mathbb {E}|X_{i} X_{i}^{T}|^{2}\right\| _{2}= & {} \mathbb {E}\left( \left\| X_{i}\right\| _{2}^{2}(v^{T}X_{i})^2\right) =\sum _{j=1}^{d}\mathbb {E}\left( x_{ij}^{2}(v^{T}X_{i})^2\right) \\\le & {} \sum _{j=1}^{d}\left( \mathbb {E}|x_{ij}|^{3}\right) ^{\frac{2}{3}}\left( \mathbb {E}(v^{T}X_{i})^6\right) ^{\frac{1}{3}} \le \sum _{j=1}^{d}\left( \mathbb {E}x_{ij}^{6}\right) ^{\frac{1}{3}}\left( \mathbb {E}(v^{T}X_{i})^6\right) ^{\frac{1}{3}}\\\le & {} d R_{(1)}^{\prime \frac{2}{3}}<\infty . \end{aligned}$$

Therefore, define ${\Omega }={\widehat{\Sigma }}_{n}^{(1)}(2, \tau _{(1)})-{\Sigma }^{(1)}$, ${\Gamma }={V}_{K} {V}_{K}^{T}$, $\widehat{{\Gamma }}=\widehat{{V}}_{K}^{(1)} \widehat{{V}}_{K}^{(1) T}$, ${\Theta }=f\left( {\Omega V}_{K}\right) {V}_{K}^{T}+{V}_{K} f\left( {\Omega V}_{K}\right) ^{T}$ where f is a linear function defined in Lemma 2 of Fan et al. (2019a), ${\Phi } =\widehat{{\Gamma }}-{\Gamma }-{\Theta }$ and $\omega =\Vert {\Omega }\Vert _{2} / \Delta $. Since

$$\begin{aligned} \widehat{{\Gamma }}-{\Gamma }={\Phi } 1_{\{\omega \le 1 / 10\}}+(\widehat{{\Gamma }}-{\Gamma }) 1_{\{\omega>1 / 10\}}-{\Theta } 1_{\{\omega >1 / 10\}}+\Theta , \end{aligned}$$

we have

$$\begin{aligned} \Vert \mathbb {E} \widehat{{\Gamma }}-{\Gamma }\Vert _{F}\le & {} \mathbb {E}\left( \Vert {\Phi }\Vert _{F} 1_{\{\omega \le 1 / 10\}}\right) +\mathbb {E}\left( \Vert \widehat{{\Gamma }}-{\Gamma }\Vert _{F} 1_{\{\omega>1 / 10\}}\right) \nonumber \\&+\mathbb {E}\left( \Vert {\Theta }\Vert _{F} 1_{\{\omega > 1 / 10\}}\right) +\Vert \mathbb {E}\left( \Theta \right) \Vert _{F}. \end{aligned}$$

(6)

By Theorem 3 in Fan et al. (2019a), it yields

$$\begin{aligned} \mathbb {E}\left( \Vert {\Phi }\Vert _{F} 1_{\{\omega \le 1 / 10\}}\right) +\mathbb {E}\left( \Vert \widehat{{\Gamma }}-{\Gamma }\Vert _{F} 1_{\{\omega>1 / 10\}}\right) +\mathbb {E}\left( \Vert {\Theta }\Vert _{F} 1_{\{\omega > 1 / 10\}}\right) \lesssim \sqrt{K} \mathbb {E}\omega ^2. \end{aligned}$$

(7)

Since $\mathbb {E}(\Omega )=\mathbb {E}\Big (\psi _{\tau _{(1)}}\left( \left\| X_{i}\right\| _{2}^{2}\right) \frac{X_{i} X_{i}^{T}}{\left\| X_{i}\right\| _{2}^{2}}-X_{i} X_{i}^{T}\Big )=\mathbb {E}\big ((\psi _{\tau _{(1)}}(\Vert X_{i}\Vert _{2}^{2}) /\Vert X_{i}\Vert _{2}^{2}-1) X_{i} X_{i}^{T}\big )$, for $\forall v \in {\mathcal {S}}^{d-1}$, we have

$$\begin{aligned}&\mathbb {E}\left( \left( \psi _{\tau _{(1)}}\left( \left\| X_{i}\right\| _{2}^{2}\right) /\left\| X_{i}\right\| _{2}^{2}-1\right) v^{T}X_{i} X_{i}^{T}v\right) \\&\quad =\mathbb {E}\left( \frac{\tau _{(1)}-\left\| X_{i}\right\| _{2}^{2}}{\left\| X_{i}\right\| _{2}^{2}}(v^{T}X_{i})^2 1_{\{\left\| X_{i}\right\| _{2}^{2}>\tau _{(1)}\}}\right) \\&\quad \le \mathbb {E}\left( (v^{T}X_{i})^2 1_{\{\left\| X_{i}\right\| _{2}^{2}>\tau _{(1)}\}}\right) \le \left( \mathbb {E}(v^{T}X_{i})^6\right) ^{\frac{1}{3}}\left( \mathrm {P}\left( \left\| X_{i} \right\| _{2}^{2}>\tau _{(1)}\right) \right) ^{\frac{2}{3}}\\&\quad \le R_{(1)}^{\prime \frac{1}{3}}\left( \mathbb {E}\left\| X_{i}\right\| _{2}^{6}/\tau _{(1)}^3\right) ^{\frac{2}{3}} \lesssim R_{(1)}^{\prime }d^{2}/(\sigma _{(1)}^2 n) \end{aligned}$$

where the second and third inequalities follow from Hölder and Markov inequality. The last inequality follows from $C_{r}$ inequality. Hence, $\Vert \mathbb {E}\left( \Omega \right) \Vert _{2}\lesssim R_{(1)}^{\prime }d^{2}/\left( \sigma _{(1)}^2 n\right) $ and

$$\begin{aligned} \Vert \mathbb {E}\left( \Theta \right) \Vert _{F}\lesssim \left\| f\left( {\mathbb {E}\left( \Omega \right) V}_{K}\right) \right\| _{F}\le \sqrt{K}\Vert \mathbb {E}\left( \Omega \right) \Vert _{2} / \Delta _{(1)} \lesssim \frac{\sqrt{K}}{\Delta _{(1)}}\frac{R_{(1)}^{\prime }d^{2}}{\sigma _{(1)}^2 n}. \end{aligned}$$

(8)

Finally, combing (6)–(8), it can be shown that

$$\begin{aligned} \Vert \mathbb {E} \widehat{{\Gamma }}-{\Gamma }\Vert _{F} \lesssim&\sqrt{K} \mathbb {E} \omega ^{2}+\Vert \mathbb {E}\left( \Theta \right) \Vert _{F}\lesssim \sqrt{K} \Delta ^{-2}\Vert \Vert {\Omega }\left\| _{2}\right\| _{\psi _{1}}^{2}+\frac{\sqrt{K}}{\Delta _{(1)}}\frac{R_{(1)}^{\prime }d^{2}}{\sigma _{(1)}^2 n} \\ \lesssim&\left( {\bar{d}}_{(1)}\frac{\sigma _{(1)}}{\Delta _{(1)}}\right) ^2\frac{\sqrt{K}}{n}+ \frac{R_{(1)}^{\prime }d^{2}}{\sigma _{(1)}^2\Delta _{(1)}}\frac{\sqrt{K}}{n}. \end{aligned}$$

$\square $

1.6 Proof of Theorem 3

Proof

By Lemma 4 and (4), we obtain that when $n \ge C_{2}K \max _{\ell \in [m]}\left( {\bar{d}}_{(\ell )}\frac{\sigma _{(\ell )}}{\Delta _{(\ell )}}\right) ^2$,

$$\begin{aligned} \begin{aligned}&\left\| \rho \left( \widetilde{{V}}_{K}, {V}_{K}\right) \right\| _{\psi _{1}}\le \left\| \rho \left( \widetilde{{V}}_{K}, {V}_{K}^{\star }\right) \right\| _{\psi _{1}}+\rho \left( {V}_{K}^{\star }, {V}_{K}\right) \\&\lesssim \sqrt{\frac{1}{m}\sum _{\ell =1}^{m}\left( {\bar{d}}_{(\ell )} \frac{\sigma _{(\ell )}}{\Delta _{(\ell )}}\right) ^{2}} \sqrt{\frac{K}{N}}+ \frac{1}{m}\sum _{\ell =1}^{m}\left( \left( {\bar{d}}_{(\ell )}\frac{\sigma _{(\ell )}}{\Delta _{(\ell )}}\right) ^{2} +\frac{R_{(\ell )}^{\prime }d^{2}}{\sigma _{(\ell )}^2\Delta _{(\ell )}}\right) \frac{\sqrt{K}}{n}. \end{aligned} \end{aligned}$$

Therefore, when the requirement on m and n is satisfied, we have for a constant $C_{3}$,

$$\begin{aligned} \left\| \rho \left( \widetilde{{V}}_{K}, {V}_{K}\right) \right\| _{\psi _{1}}\le C_{3}\sqrt{\frac{1}{m}\sum _{\ell =1}^{m}\left( {\bar{d}}_{(\ell )} \frac{\sigma _{(\ell )}}{\Delta _{(\ell )}}\right) ^{2}} \sqrt{\frac{K}{N}}. \end{aligned}$$

$\square $

Rights and permissions

Reprints and permissions

About this article

Cite this article

Li, K., Bao, H. & Zhang, L. Robust covariance estimation for distributed principal component analysis. Metrika 85, 707–732 (2022). https://doi.org/10.1007/s00184-021-00848-9

Download citation

Received: 14 December 2020
Accepted: 05 October 2021
Published: 22 November 2021
Issue Date: August 2022
DOI: https://doi.org/10.1007/s00184-021-00848-9

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Robust covariance estimation for distributed principal component analysis

Abstract

Access this article

Similar content being viewed by others

Variance Variation Criterion and Consistency in Estimating the Number of Significant Signals of High-dimensional PCA

Generalized spherical principal component analysis

ECOPICA: empirical copula-based independent component analysis

Notes

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Appendix

1.1 Proof of Lemma 1

Proof

1.2 Proof of Theorem 1

Proof

1.3 Proof of Lemma 3

Proof

1.4 Proof of Theorem 2

Proof

1.5 Proof of Lemma 4

Proof

1.6 Proof of Theorem 3

Proof

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Robust covariance estimation for distributed principal component analysis

Abstract

Access this article

Similar content being viewed by others

Variance Variation Criterion and Consistency in Estimating the Number of Significant Signals of High-dimensional PCA

Generalized spherical principal component analysis

ECOPICA: empirical copula-based independent component analysis

Notes

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Appendix

Appendix

1.1 Proof of Lemma 1

Proof

1.2 Proof of Theorem 1

Proof

1.3 Proof of Lemma 3

Proof

1.4 Proof of Theorem 2

Proof

1.5 Proof of Lemma 4

Proof

1.6 Proof of Theorem 3

Proof

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation