当前位置: X-MOL 学术Mol. Ecol. Resour. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
User‐friendly bioinformatics pipeline gDAT (graphical downstream analysis tool) for analysing rDNA sequences
Molecular Ecology Resources ( IF 5.5 ) Pub Date : 2021-02-01 , DOI: 10.1111/1755-0998.13340
Martti Vasar 1 , John Davison 1 , Lena Neuenkamp 2 , Siim-Kaarel Sepp 1 , J Peter W Young 3 , Mari Moora 1 , Maarja Öpik 1
Affiliation  

High‐throughput sequencing (HTS) of multiple organisms in parallel (metabarcoding) has become a routine and cost‐effective method for the analysis of microbial communities in environmental samples. However, careful data treatment is required to identify potential errors in HTS data, and the large volume of data generated by HTS requires in‐house experience with command line tools for downstream analysis. This paper introduces a pipeline that incorporates the most common command line tools into an easy‐to‐use graphical interface—gDAT. By using the Python scripting language, the pipeline is compatible with the latest Windows, macOS and Linux operating systems. The pipeline supports analysis of Sanger, 454, IonTorrent, Illumina and PacBio sequences, allows custom modification of quality filtering steps, and implements both open and closed‐reference operational taxonomic unit‐picking for sequence identification. Predefined parameters are optimized for analysis of small subunit (SSU) rRNA gene amplicons from arbuscular mycorrhizal fungi, but the pipeline is widely applicable to metabarcoding studies targeting a broad range of organisms. The pipeline was additionally tested with data using general eukaryotic primers from the SSU gene region and fungal primers from the internal transcribed spacer (ITS) marker region. We describe the pipeline design and evaluate its performance and speed by conducting analysis of example data sets using different marker regions sequenced on Illumina platforms. The graphical interface, with the option to use the command line if needed, provides an accessible tool for rapid data analysis with repeatability and logging capabilities. Keeping the software open‐source maximizes code accessibility, allowing scrutiny and bug fixes by the community.

中文翻译:

用户友好的生物信息学管道 gDAT(图形下游分析工具),用于分析 rDNA 序列

多种生物的高通量测序(HTS)并行(元条形码)已成为分析环境样品中微生物群落的常规且具有成本效益的方法。然而,需要仔细的数据处理来识别 HTS 数据中的潜在错误,并且 HTS 生成的大量数据需要使用命令行工具进行下游分析的内部经验。本文介绍了一个管道,它将最常见的命令行工具合并到一个易于使用的图形界面 - gDAT 中。通过使用 Python 脚本语言,管道与最新的 Windows、macOS 和 Linux 操作系统兼容。该管道支持对 Sanger、454、IonTorrent、Illumina 和 PacBio 序列的分析,允许对质量过滤步骤进行自定义修改,并实施开放和封闭参考操作分类单元挑选以进行序列识别。预定义参数针对来自丛枝菌根真菌的小亚基 (SSU) rRNA 基因扩增子的分析进行了优化,但该管道广泛适用于针对广泛生物体的元条形码研究。使用来自 SSU 基因区域的通用真核引物和来自内部转录间隔区 (ITS) 标记区域的真菌引物,使用数据额外测试了管道。我们通过使用在 Illumina 平台上测序的不同标记区域对示例数据集进行分析来描述管道设计并评估其性能和速度。图形界面,如果需要,可以选择使用命令行,提供了一种易于使用的工具,用于具有可重复性和日志记录功能的快速数据分析。保持软件开源可以最大限度地提高代码的可访问性,允许社区进行审查和错误修复。
更新日期:2021-04-12
down
wechat
bug