An open-sourced bioinformatic pipeline for the processing of Next-Generation Sequencing derived nucleotide reads: Identification and authentication of ancient metagenomic DNA,bioRxiv - Genomics

当前位置： X-MOL 学术 › bioRxiv. Genom. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

An open-sourced bioinformatic pipeline for the processing of Next-Generation Sequencing derived nucleotide reads: Identification and authentication of ancient metagenomic DNA
bioRxiv - Genomics Pub Date : 2020-05-28 , DOI: 10.1101/2020.04.20.050369
Thomas C. Collin , Konstantina Drosou , Jeremiah Daniel O’Riordan , Tengiz Meshveliani , Ron Pinhasi , Robin N. M. Feeney

Bioinformatic pipelines optimised for the processing and assessment of metagenomic ancient DNA (aDNA) are needed for studies that do not make use of high yielding DNA capture techniques. These bioinformatic pipelines are traditionally optimised for broad aDNA purposes, are contingent on selection biases and are associated with high costs. Here we present a bioinformatic pipeline optimised for the identification and assessment of ancient metagenomic DNA without the use of expensive DNA capture techniques. Our pipeline actively conserves aDNA reads, allowing the application of a bioinformatic approach by identifying the shortest reads possible for analysis (22-28bp). The time required for processing is drastically reduced through the use of a 10% segmented non-redundant sequence file (229 hours to 53). Processing speed is improved through the optimisation of BLAST parameters (53 hours to 48). Additionally, the use of multi-alignment authentication in the identification of taxa increases overall confidence of metagenomic results. DNA yields are further increased through the use of an optimal MAPQ setting (MAPQ 25) and the optimisation of the duplicate removal process using multiple sequence identifiers (a 4.35-6.88% better retention). Moreover, characteristic aDNA damage patterns are used to bioinformatically assess ancient vs. modern DNA origin throughout pipeline development. Of additional value, this pipeline uses open-source technologies, which increases its accessibility to the scientific community.

中文翻译：

一个开放源代码的生物信息流水线，用于处理下一代测序衍生的核苷酸读取：古代宏基因组DNA的鉴定和认证

不使用高产量DNA捕获技术的研究需要优化用于处理和评估宏基因组古代DNA（aDNA）的生物信息学管道。这些生物信息流水线传统上已针对广泛的aDNA目的进行了优化，取决于选择偏向，并伴随着高成本。在这里，我们介绍了一条生物信息学流水线，该流水线最优化用于识别和评估古代宏基因组学DNA，而无需使用昂贵的DNA捕获技术。我们的产品线积极保存aDNA读段，通过识别可能用于分析的最短读段（22-28bp），从而允许应用生物信息学方法。通过使用10％分段的非冗余序列文件（229小时至53），大大减少了处理所需的时间。通过优化BLAST参数（53小时至48）提高了处理速度。此外，在识别分类单元中使用多重比对身份验证可提高宏基因组学结果的总体置信度。通过使用最佳MAPQ设置（MAPQ 25）和使用多个序列标识符优化重复去除过程（保留率提高4.35-6.88％），可进一步提高DNA产量。此外，在整个管线开发过程中，特征性的aDNA损伤模式用于生物信息学评估古代DNA与现代DNA的起源。此管道还具有其他价值，它使用开源技术，从而增加了其对科学界的可访问性。在识别分类单元中使用多重比对身份验证可提高宏基因组学结果的总体置信度。通过使用最佳MAPQ设置（MAPQ 25）和使用多个序列标识符优化重复去除过程（保留率提高4.35-6.88％），可进一步提高DNA产量。此外，在整个管线开发过程中，特征性的aDNA损伤模式用于生物信息学评估古代DNA与现代DNA的起源。此管道还具有其他价值，它使用开源技术，从而增加了其对科学界的可访问性。在识别分类单元中使用多重比对身份验证可提高宏基因组学结果的总体置信度。通过使用最佳MAPQ设置（MAPQ 25）和使用多个序列标识符优化重复去除过程（保留率提高4.35-6.88％），可进一步提高DNA产量。此外，在整个管线开发过程中，特征性的aDNA损伤模式用于生物信息学评估古代DNA与现代DNA的起源。此管道还具有其他价值，它使用开源技术，从而增加了其对科学界的可访问性。此外，在整个管线开发过程中，特征性的aDNA损伤模式用于生物信息学评估古代DNA与现代DNA的起源。该管道还具有开放式源代码技术的附加价值，这增加了它对科学界的可访问性。此外，在整个管线开发过程中，特征性的aDNA损伤模式用于生物信息学评估古代DNA与现代DNA的起源。此管道还具有其他价值，它使用开源技术，从而增加了其对科学界的可访问性。

更新日期：2020-05-28

点击分享查看原文

点击收藏

公开下载

阅读更多本刊最新论文