当前位置: X-MOL 学术Biometrika › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Multisample estimation of bacterial composition matrices in metagenomics data
Biometrika ( IF 2.7 ) Pub Date : 2019-12-06 , DOI: 10.1093/biomet/asz062
Yuanpei Cao 1 , Anru Zhang 2 , Hongzhe Li 1
Affiliation  

Metagenomics sequencing is routinely applied to quantify bacterial abundances in microbiome studies, where bacterial composition is estimated based on the sequencing read counts. Due to limited sequencing depth and DNA dropouts, many rare bacterial taxa might not be captured in the final sequencing reads, which results in many zero counts. Naive composition estimation using count normalization leads to many zero proportions, which tend to result in inaccurate estimates of bacterial abundance and diversity. This paper takes a multisample approach to estimation of bacterial abundances in order to borrow information across samples and across species. Empirical results from real datasets suggest that the composition matrix over multiple samples is approximately low rank, which motivates a regularized maximum likelihood estimation with a nuclear norm penalty. An efficient optimization algorithm using the generalized accelerated proximal gradient and Euclidean projection onto simplex space is developed. Theoretical upper bounds and the minimax lower bounds of the estimation errors, measured by the Kullback–Leibler divergence and the Frobenius norm, are established. Simulation studies demonstrate that the proposed estimator outperforms the naive estimators. The method is applied to an analysis of a human gut microbiome dataset.

中文翻译:

宏基因组学数据中细菌成分矩阵的多样本估计

在微生物组研究中,通常使用元基因组测序来量化细菌的丰度,在微生物研究中,细菌的组成是根据测序读数计数来估算的。由于有限的测序深度和DNA缺失,许多稀有细菌类群可能无法在最终测序读数中捕获,从而导致许多零计数。使用计数归一化的原始成分估计会导致许多零比例,这往往会导致细菌丰度和多样性的估计不准确。本文采用多样本方法来估计细菌的丰度,以便在样本和物种之间借用信息。实际数据集的经验结果表明,多个样本的成分矩阵近似为低秩,这会激发带有核范数惩罚的正则化最大似然估计。开发了一种有效的优化算法,该算法使用广义加速近端梯度和欧氏投影到单纯形空间上。建立了由Kullback-Leibler散度和Frobenius范数测量的估计误差的理论上限和极小极大限。仿真研究表明,提出的估计量优于单纯的估计量。该方法适用于人类肠道微生物组数据集的分析。建立了由Kullback-Leibler散度和Frobenius范数确定的度量。仿真研究表明,所提出的估计器优于单纯估计器。该方法适用于人类肠道微生物组数据集的分析。建立了由Kullback-Leibler散度和Frobenius范数确定的度量。仿真研究表明,提出的估计量优于单纯的估计量。该方法适用于人类肠道微生物组数据集的分析。
更新日期:2020-04-17
down
wechat
bug