当前位置: X-MOL 学术BMC Bioinform. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
In silico benchmarking of metagenomic tools for coding sequence detection reveals the limits of sensitivity and precision
BMC Bioinformatics ( IF 3 ) Pub Date : 2020-10-15 , DOI: 10.1186/s12859-020-03802-0
Jonathan Louis Golob , Samuel Schwartz Minot

High-throughput sequencing can establish the functional capacity of a microbial community by cataloging the protein-coding sequences (CDS) present in the metagenome of the community. The relative performance of different computational methods for identifying CDS from whole-genome shotgun sequencing is not fully established. Here we present an automated benchmarking workflow, using synthetic shotgun sequencing reads for which we know the true CDS content of the underlying communities, to determine the relative performance (sensitivity, positive predictive value or PPV, and computational efficiency) of different metagenome analysis tools for extracting the CDS content of a microbial community. Assembly-based methods are limited by coverage depth, with poor sensitivity for CDS at < 5X depth of sequencing, but have excellent PPV. Mapping-based techniques are more sensitive at low coverage depths, but can struggle with PPV. We additionally describe an expectation maximization based iterative algorithmic approach which we show to successfully improve the PPV of a mapping based technique while retaining improved sensitivity and computational efficiency. Our benchmarking approach reveals the trade-offs of assembly versus alignment-based approaches and the relative performance of specific implementations when one wishes to extract the protein coding capacity of microbial communities.

中文翻译:

用于编码序列检测的宏基因组学工具的计算机基准测试揭示了灵敏度和精度的局限性

高通量测序可以通过对微生物群落的基因组中存在的蛋白质编码序列(CDS)进行分类来建立微生物群落的功能能力。从全基因组shot弹枪测序中识别CDS的不同计算方法的相对性能尚未完全确立。在这里,我们提供了一个自动化的基准测试流程,该流程使用合成的散弹枪测序读数(我们知道其基础社区的真实CDS含量),以确定用于提取微生物群落的CDS含量。基于组装的方法受覆盖深度的限制,在小于5倍的测序深度时对CDS的敏感性较差,但具有出色的PPV。基于地图的技术在低覆盖深度时更为敏感,但会与PPV斗争。我们还描述了一种基于期望最大化的迭代算法,该方法展示了成功改善基于映射技术的PPV,同时保留了改进的灵敏度和计算效率。当人们希望提取微生物群落的蛋白质编码能力时,我们的基准测试方法揭示了组装方法与基于比对方法的权衡,以及特定实施方案的相对性能。我们还描述了一种基于期望最大化的迭代算法,该方法展示了成功改善基于映射技术的PPV,同时保留了改进的灵敏度和计算效率。当人们希望提取微生物群落的蛋白质编码能力时,我们的基准测试方法揭示了组装方法与基于比对方法的权衡,以及特定实施方案的相对性能。我们还描述了一种基于期望最大化的迭代算法,该方法展示了成功改善基于映射技术的PPV,同时保留了改进的灵敏度和计算效率。当人们希望提取微生物群落的蛋白质编码能力时,我们的基准测试方法揭示了组装方法与基于比对方法的权衡,以及特定实施方案的相对性能。
更新日期:2020-10-16
down
wechat
bug