当前位置: X-MOL 学术Ecol. Evol. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
NGSpeciesID: DNA barcode and amplicon consensus generation from long‐read sequencing data
Ecology and Evolution ( IF 2.6 ) Pub Date : 2021-01-11 , DOI: 10.1002/ece3.7146
Kristoffer Sahlin 1 , Marisa C. W. Lim 2 , Stefan Prost 3, 4
Affiliation  

Third‐generation sequencing technologies, such as Oxford Nanopore Technologies (ONT) and Pacific Biosciences (PacBio), have gained popularity over the last years. These platforms can generate millions of long‐read sequences. This is not only advantageous for genome sequencing projects, but also advantageous for amplicon‐based high‐throughput sequencing experiments, such as DNA barcoding. However, the relatively high error rates associated with these technologies still pose challenges for generating high‐quality consensus sequences. Here, we present NGSpeciesID, a program which can generate highly accurate consensus sequences from long‐read amplicon sequencing technologies, including ONT and PacBio. The tool includes clustering of the reads to help filter out contaminants or reads with high error rates and employs polishing strategies specific to the appropriate sequencing platform. We show that NGSpeciesID produces consensus sequences with improved usability by minimizing preprocessing and software installation and scalability by enabling rapid processing of hundreds to thousands of samples, while maintaining similar consensus accuracy as current pipelines.

中文翻译:

NGSpeciesID:从长时间读取的测序数据中生成DNA条码和扩增子共有序列

牛津纳米孔技术(ONT)和太平洋生物科学(PacBio)等第三代测序技术在过去几年中已广受欢迎。这些平台可以生成数百万个长读序列。这不仅有利于基因组测序项目,而且还有利于基于扩增子的高通量测序实验,例如DNA条形码。但是,与这些技术相关的相对较高的错误率仍然对生成高质量的共识序列构成了挑战。在这里,我们介绍了NGSpeciesID,该程序可以通过长期阅读的扩增子测序技术(包括ONT和PacBio)生成高度准确的共有序列。该工具包括对读段进行聚类,以帮助过滤出具有高错误率的污染物或读段,并采用特定于适当测序平台的抛光策略。我们展示了NGSpeciesID通过最大限度地减少预处理和软件安装以及可扩展性(通过支持快速处理数百至数千个样本,同时保持与当前管道相似的一致性准确性)来生成具有改进可用性的一致性序列。
更新日期:2021-02-05
down
wechat
bug