当前位置: X-MOL 学术Biometrika › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
High-dimensional log-error-in-variable regression with applications to microbial compositional data analysis
Biometrika ( IF 2.4 ) Pub Date : 2021-03-29 , DOI: 10.1093/biomet/asab020
Pixu Shi 1 , Yuchen Zhou 1 , Anru R Zhang 2
Affiliation  

Summary In microbiome and genomic studies, the regression of compositional data has been a crucial tool for identifying microbial taxa or genes that are associated with clinical phenotypes. To account for the variation in sequencing depth, the classic log-contrast model is often used where read counts are normalized into compositions. However, zero read counts and the randomness in covariates remain critical issues. We introduce a surprisingly simple, interpretable and efficient method for the estimation of compositional data regression through the lens of a novel high-dimensional log-error-in-variable regression model. The proposed method provides corrections on sequencing data with possible overdispersion and simultaneously avoids any subjective imputation of zero read counts. We provide theoretical justifications with matching upper and lower bounds for the estimation error. The merit of the procedure is illustrated through real data analysis and simulation studies.

中文翻译:

应用于微生物组成数据分析的高维对数误差变量回归

总结 在微生物组和基因组研究中,成分数据的回归一直是识别与临床表型相关的微生物类群或基因的关键工具。为了解释测序深度的变化,通常使用经典的对数对比模型,其中读取计数被归一化为组成。然而,零读取计数和协变量的随机性仍然是关键问题。我们通过一种新颖的高维变量对数误差回归模型的镜头,介绍了一种令人惊讶的简单、可解释和有效的方法来估计成分数据回归。所提出的方法提供了对可能过度分散的测序数据的校正,同时避免了零读数计数的任何主观估算。我们为估计误差提供了匹配上限和下限的理论依据。该程序的优点通过真实的数据分析和模拟研究来说明。
更新日期:2021-03-29
down
wechat
bug