当前位置: X-MOL 学术J. Comput. Biol. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Variant-Kudu: An Efficient Tool kit Leveraging Distributed Bitmap Index for Analysis of Massive Genetic Variation Datasets.
Journal of Computational Biology ( IF 1.4 ) Pub Date : 2020-09-04 , DOI: 10.1089/cmb.2019.0344
Jianye Fan 1 , Shoubin Dong 1 , Bo Wang 1
Affiliation  

The storage and analysis of massive genetic variation datasets in variant call format (VCF) become a great challenge with the rapid growth of genetic variation data in recent years. Traditional single process based tool kits become increasingly inefficient when analyzing massive genetic variation data. While emerging distributed storage technology such as Apache Kudu offers attractive solution, it is demanded to develop distributed storage tool kit for VCF dataset. In this article, we present Variant-Kudu, an efficient genome tool kit for storing and analyzing massive genetic variation datasets. Based on a new distributed scheme, the genetic variation data would be segmented and stored in Kudu on multinode. With this scheme, data can be randomly accessed at low latency and scanned efficiently. Aiming at reducing the queries' execution time, a strategy of distributed bitmap index is proposed and a parallel query method is designed, which expedite analyses of massive genetic variation data. Variant-Kudu is a scalable tool kit to analyze massive genetic variation datasets, and our experiments demonstrate that Variant-Kudu achieves high performance on a multinode cluster.

中文翻译:

Variant-Kudu:利用分布式位图索引分析大量遗传变异数据集的高效工具包。

近年来,随着遗传变异数据的快速增长,以变异调用格式(VCF)存储和分析海量遗传变异数据集成为一个巨大的挑战。在分析大量遗传变异数据时,传统的基于单一流程的工具包变得越来越低效。Apache Kudu 等新兴的分布式存储技术提供了有吸引力的解决方案,但需要为 VCF 数据集开发分布式存储工具包。在本文中,我们介绍了 Variant-Kudu,这是一种高效的基因组工具包,用于存储和分析大量遗传变异数据集。基于新的分布式方案,遗传变异数据将被分割并存储在多节点的 Kudu 中。使用此方案,可以低延迟随机访问数据并有效扫描。旨在减少查询的执行时间,提出了分布式位图索引策略,设计了并行查询方法,加快了海量遗传变异数据的分析。Variant-Kudu 是一个可扩展的工具包,用于分析海量遗传变异数据集,我们的实验表明 Variant-Kudu 在多节点集群上实现了高性能。
更新日期:2020-09-14
down
wechat
bug