BLAST-QC: automated analysis of BLAST results,Environmental Microbiome

当前位置： X-MOL 学术 › Stand. Genomic. Sci. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

BLAST-QC: automated analysis of BLAST results
Environmental Microbiome ( IF 6.2 ) Pub Date : 2020-08-12 , DOI: 10.1186/s40793-020-00361-y
Behzad Torkian , Spencer Hann , Eva Preisner , R. Sean Norman

The Basic Local Alignment Search Tool (BLAST) from NCBI is the preferred utility for sequence alignment and identification for bioinformatics and genomics research. Among researchers using NCBI’s BLAST software, it is well known that analyzing the results of a large BLAST search can be tedious and time-consuming. Furthermore, with the recent discussions over the effects of parameters such as ‘-max_target_seqs’ on the BLAST heuristic search process, the use of these search options are questionable. This leaves using a stand-alone parser as one of the only options of condensing these large datasets, and with few available for download online, the task is left to the researcher to create a specialized piece of software anytime they need to analyze BLAST results. The need for a streamlined and fast script that solves these issues and can be easily implemented into a variety of bioinformatics and genomics workflows was the initial motivation for developing this software. In this study, we demonstrate the effectiveness of BLAST-QC for analysis of BLAST results and its desirability over the other available options. Applying genetic sequence data from our bioinformatic workflows, we establish BLAST_QC’s superior runtime when compared to existing parsers developed with commonly used BioPerl and BioPython modules, as well as C and Java implementations of the BLAST_QC program. We discuss the ‘max_target_seqs’ parameter, the usage of and controversy around the use of the parameter, and offer a solution by demonstrating the ability of our software to provide the functionality this parameter was assumed to produce, as well as a variety of other parsing options. Executions of the script on example datasets are given, demonstrating the implemented functionality and providing test-cases of the program. BLAST-QC is designed to be integrated into existing software, and we establish its effectiveness as a module of workflows or other processes. BLAST-QC provides the community with a simple, lightweight and portable Python script that allows for easy quality control of BLAST results while avoiding the drawbacks of other options. This includes the uncertain results of applying the -max_target_seqs parameter or relying on the cumbersome dependencies of other options like BioPerl, Java, etc. which add complexity and run time when running large data sets of sequences. BLAST-QC is ideal for use in high-throughput workflows and pipelines common in bioinformatic and genomic research, and the script has been designed for portability and easy integration into whatever type of processes the user may be running.

中文翻译：

BLAST-QC：BLAST结果的自动化分析

NCBI的基本局部比对搜索工具（BLAST）是用于生物信息学和基因组学研究的序列比对和鉴定的首选实用程序。在使用NCBI的BLAST软件的研究人员中，众所周知，分析大型BLAST搜索的结果可能既乏味又耗时。此外，随着最近关于诸如“ -max_target_seqs”之类的参数对BLAST启发式搜索过程的影响的讨论，这些搜索选项的使用值得怀疑。这就留下了使用独立的解析器作为压缩这些大型数据集的唯一选项之一，并且几乎没有可供在线下载的任务，这项任务留给研究人员在需要分析BLAST结果的任何时候创建专门的软件。解决这些问题并可以轻松实现到各种生物信息学和基因组学工作流中的对精简，快速脚本的需求是开发此软件的最初动机。在这项研究中，我们证明了BLAST-QC对BLAST结果进行分析的有效性及其与其他可用选项相比的可取性。与使用常用BioPerl和BioPython模块以及BLAST_QC程序的C和Java实现开发的现有解析器相比，利用来自生物信息工作流程的遗传序列数据，我们可以建立BLAST_QC的卓越运行时间。我们讨论了“ max_target_seqs”参数，该参数的用法以及在该参数使用方面的争议，并通过展示我们的软件提供该参数假定具有的功能的能力来提供解决方案，以及其他各种解析选项。给出了脚本在示例数据集上的执行情况，展示了已实现的功能并提供了程序的测试用例。BLAST-QC旨在集成到现有软件中，我们将其作为工作流或其他流程的模块来确定其有效性。BLAST-QC为社区提供了一个简单，轻巧和可移植的Python脚本，该脚本可轻松控制BLAST结果的质量，同时避免其他选择的弊端。这包括应用-max_target_seqs参数或依赖其他选项（如BioPerl，Java等）的繁琐依赖关系所带来的不确定结果，这会在运行大型数据集序列时增加复杂性和运行时间。

更新日期：2020-08-12

点击分享查看原文

点击收藏

公开下载

阅读更多本刊最新论文