当前位置: X-MOL 学术Mol. Ecol. Resour. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Log-ratio analysis of microbiome data with many zeroes is library size dependent
Molecular Ecology Resources ( IF 5.5 ) Pub Date : 2021-03-24 , DOI: 10.1111/1755-0998.13391
Dennis E Te Beest 1 , Els H Nijhuis 2 , Tim W R Möhlmann 3 , Cajo J F Ter Braak 1
Affiliation  

Microbiome composition data collected through amplicon sequencing are count data on taxa in which the total count per sample (the library size) is an artefact of the sequencing platform, and as a result, such data are compositional. To avoid library size dependency, one common way of analysing multivariate compositional data is to perform a principal component analysis (PCA) on data transformed with the centred log-ratio, hereafter called a log-ratio PCA. Two aspects typical of amplicon sequencing data are the large differences in library size and the large number of zeroes. In this study, we show on real data and by simulation that, applied to data that combine these two aspects, log-ratio PCA is nevertheless heavily dependent on the library size. This leads to a reduction in power when testing against any explanatory variable in log-ratio redundancy analysis. If there is additionally a correlation between the library size and the explanatory variable, then the type 1 error becomes inflated. We explore putative solutions to this problem.

中文翻译:

具有多个零的微生物组数据的对数比分析取决于文库大小

通过扩增子测序收集的微生物组组成数据是分类群的计数数据,其中每个样本的总计数(文库大小)是测序平台的人工产物,因此,此类数据具有组成性。为避免库大小依赖性,分析多变量组成数据的一种常用方法是对使用中心对数比(以下称为对数比 PCA)转换的数据执行主成分分析 (PCA)。扩增子测序数据的两个典型方面是文库大小的巨大差异和大量的零。在这项研究中,我们通过真实数据和模拟表明,应用于结合这两个方面的数据,对数比 PCA 仍然严重依赖于库大小。在对数比率冗余分析中针对任何解释变量进行测试时,这会导致功效降低。如果库大小和解释变量之间存在额外的相关性,那么类型 1 错误就会被夸大。我们探索了这个问题的假定解决方案。
更新日期:2021-03-24
down
wechat
bug