当前位置: X-MOL 学术Stat. Med. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Robust covariance estimation for high-dimensional compositional data with application to microbial communities analysis
Statistics in Medicine ( IF 2 ) Pub Date : 2021-04-11 , DOI: 10.1002/sim.8979
Yong He 1 , Pengfei Liu 2 , Xinsheng Zhang 3 , Wang Zhou 4
Affiliation  

Microbial communities analysis is drawing growing attention due to the rapid development fire of high-throughput sequencing techniques nowadays. The observed data has the following typical characteristics: it is high-dimensional, compositional (lying in a simplex) and even would be leptokurtic and highly skewed due to the existence of overly abundant taxa, which makes the conventional correlation analysis infeasible to study the co-occurrence and co-exclusion relationship between microbial taxa. In this article, we address the challenges of covariance estimation for this kind of data. Assuming the basis covariance matrix lying in a well-recognized class of sparse covariance matrices, we adopt a proxy matrix known as centered log-ratio covariance matrix in the literature. We construct a Median-of-Means estimator for the centered log-ratio covariance matrix and propose a thresholding procedure that is adaptive to the variability of individual entries. By imposing a much weaker finite fourth moment condition compared with the sub-Gaussianity condition in the literature, we derive the optimal rate of convergence under the spectral norm. In addition, we also provide theoretical guarantee on support recovery. The adaptive thresholding procedure of the MOM estimator is easy to implement and gains robustness when outliers or heavy-tailedness exist. Thorough simulation studies are conducted to show the advantages of the proposed procedure over some state-of-the-arts methods. At last, we apply the proposed method to analyze a microbiome dataset in human gut.

中文翻译:

用于微生物群落分析的高维成分数据的稳健协方差估计

由于当今高通量测序技术的快速发展,微生物群落分析越来越受到关注。观测到的数据具有以下典型特征:高维、组成(单形),甚至会因为存在过度丰富的类群而出现尖峰和高度偏斜,这使得常规相关分析无法研究共形。 -微生物分类群之间的发生和共排斥关系。在本文中,我们解决了此类数据协方差估计的挑战。假设基础协方差矩阵位于公认的稀疏协方差矩阵类别中,我们采用文献中称为中心对数比协方差矩阵的代理矩阵。我们为居中的对数比协方差矩阵构建了一个均值中位数估计器,并提出了一个适应各个条目可变性的阈值程序。与文献中的亚高斯条件相比,通过施加更弱的有限四阶矩条件,我们推导出谱范数下的最佳收敛速度。此外,我们还为支撑恢复提供理论保障。MOM 估计器的自适应阈值程序易于实现,并在存在异常值或重尾时获得鲁棒性。进行了彻底的模拟研究,以显示所提出的程序相对于一些最先进的方法的优势。最后,我们应用所提出的方法来分析人类肠道中的微生物组数据集。
更新日期:2021-06-05
down
wechat
bug