当前位置: X-MOL 学术Microbiome › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
A framework for assessing 16S rRNA marker-gene survey data analysis methods using mixtures.
Microbiome ( IF 13.8 ) Pub Date : 2020-03-13 , DOI: 10.1186/s40168-020-00812-1
Nathan D Olson 1, 2, 3 , M Senthil Kumar 2, 3 , Shan Li 4 , Domenick J Braccia 2, 3 , Stephanie Hao 5 , Winston Timp 5 , Marc L Salit 6 , O Colin Stine 4 , Hector Corrada Bravo 2, 3, 7
Affiliation  

BACKGROUND There are a variety of bioinformatic pipelines and downstream analysis methods for analyzing 16S rRNA marker-gene surveys. However, appropriate assessment datasets and metrics are needed as there is limited guidance to decide between available analysis methods. Mixtures of environmental samples are useful for assessing analysis methods as one can evaluate methods based on calculated expected values using unmixed sample measurements and the mixture design. Previous studies have used mixtures of environmental samples to assess other sequencing methods such as RNAseq. But no studies have used mixtures of environmental to assess 16S rRNA sequencing. RESULTS We developed a framework for assessing 16S rRNA sequencing analysis methods which utilizes a novel two-sample titration mixture dataset and metrics to evaluate qualitative and quantitative characteristics of count tables. Our qualitative assessment evaluates feature presence/absence exploiting features only present in unmixed samples or titrations by testing if random sampling can account for their observed relative abundance. Our quantitative assessment evaluates feature relative and differential abundance by comparing observed and expected values. We demonstrated the framework by evaluating count tables generated with three commonly used bioinformatic pipelines: (i) DADA2 a sequence inference method, (ii) Mothur a de novo clustering method, and (iii) QIIME an open-reference clustering method. The qualitative assessment results indicated that the majority of Mothur and QIIME features only present in unmixed samples or titrations were accounted for by random sampling alone, but this was not the case for DADA2 features. Combined with count table sparsity (proportion of zero-valued cells in a count table), these results indicate DADA2 has a higher false-negative rate whereas Mothur and QIIME have higher false-positive rates. The quantitative assessment results indicated the observed relative abundance and differential abundance values were consistent with expected values for all three pipelines. CONCLUSIONS We developed a novel framework for assessing 16S rRNA marker-gene survey methods and demonstrated the framework by evaluating count tables generated with three bioinformatic pipelines. This framework is a valuable community resource for assessing 16S rRNA marker-gene survey bioinformatic methods and will help scientists identify appropriate analysis methods for their marker-gene surveys.

中文翻译:

使用混合物评估16S rRNA标记基因调查数据分析方法的框架。

背景技术存在多种用于分析16S rRNA标记基因调查的生物信息流水线和下游分析方法。但是,由于在可用分析方法之间进行决定的指导有限,因此需要适当的评估数据集和指标。环境样品的混合物对于评估分析方法很有用,因为可以使用未混合的样品测量值和混合物设计基于计算出的期望值来评估方法。先前的研究使用环境样品的混合物来评估其他测序方法,例如RNAseq。但尚无研究使用环境混合物评估16S rRNA测序。结果我们开发了一种评估16S rRNA测序分析方法的框架,该框架利用新颖的两样品滴定混合物数据集和指标来评估计数表的定性和定量特征。我们的定性评估通过测试随机采样是否能够解决观察到的相对丰度来评估仅在未混合样品或滴定中存在的特征的存在/不存在。我们的定量评估通过比较观察值和期望值来评估特征相对和差异丰度。我们通过评估使用三种常用生物信息流水线生成的计数表来演示该框架:(i)DADA2一种序列推断方法,(ii)Mothur从头聚类方法,以及(iii)QIIME一种开放参考聚类方法。定性评估结果表明,仅存在于未混合样品或滴定液中的大部分Mothur和QIIME功能仅通过随机采样即可解释,但DADA2功能并非如此。结合计数表稀疏性(计数表中零值单元的比例),这些结果表明DADA2的假阴性率更高,而Mothur和QIIME的假阳性率更高。定量评估结果表明,观察到的相对丰度和微分丰度值与所有三个管道的预期值一致。结论我们开发了一种评估16S rRNA标记基因调查方法的新颖框架,并通过评估由三个生物信息流水线生成的计数表展示了该框架。
更新日期:2020-04-22
down
wechat
bug