当前位置: X-MOL 学术J. Supercomput. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
High throughput BLAST algorithm using spark and cassandra
The Journal of Supercomputing ( IF 3.3 ) Pub Date : 2020-05-28 , DOI: 10.1007/s11227-020-03338-3
Fernando Cores , Fernando Guirado , Josep Lluis Lerida

The rise of high-resolution and high-throughput sequencing technologies has driven the emergence of such new fields of application as precision medicine. However, this has also led to an increase in the storage and processing requirements for the bioinformatics tools, which can only be provided by high-performance and massive data processing infrastructures. Such technologies allow the development of scalable, efficient and reliable bioinformatics tools. In this paper, a new implementation of the Basic Local Alignment Search Tool algorithm is presented. Our proposal, named Sparky-Blast, utilizes Cassandra database to store the different reference datasets and the Apache Spark processing framework to calculate the indexes and process the queries. This successful approach avoids the bottleneck that suffers the original BLAST version that is limited to the resources of a single machine. Sparky-Blast is capable of using the distributed resources of a Big-Data Cluster to process queries in parallel, thus, improving both the response time and the system throughput. At the same time, the use of a distributed architecture like Hadoop provides unlimited scalability from the point of view of both the hardware infrastructure and performance.

中文翻译:

使用 spark 和 cassandra 的高吞吐量 BLAST 算法

高分辨率、高通量测序技术的兴起,带动了精准医学等新应用领域的出现。然而,这也导致对生物信息学工具的存储和处理要求增加,而这些要求只能由高性能和海量数据处理基础设施提供。这些技术允许开发可扩展、高效和可靠的生物信息学工具。在本文中,介绍了基本局部对齐搜索工具算法的新实现。我们的提议,名为 Sparky-Blast,利用 Cassandra 数据库来存储不同的参考数据集和 Apache Spark 处理框架来计算索引和处理查询。这种成功的方法避免了原始 BLAST 版本受限于单个机器资源的瓶颈。Sparky-Blast 能够使用大数据集群的分布式资源并行处理查询,从而提高响应时间和系统吞吐量。同时,使用像Hadoop这样的分布式架构,无论是从硬件基础设施还是性能的角度来看,都提供了无限的可扩展性。
更新日期:2020-05-28
down
wechat
bug