当前位置: X-MOL 学术Int. J. Inf. Technol. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
WBTC: a new approach for efficient storage of genomic data
International Journal of Information Technology Pub Date : 2020-06-13 , DOI: 10.1007/s41870-020-00472-2
Sanjeev kumar , Suneeta Agarwal , Ranvijay

With the improvement in high-throughput genome sequencing technology, huge amount of genomic data are generated every day. These data are used in numerous applications: sequence alignment, drug discovery and personalized medicine, etc. To efficiently handle genome data for storage, processing, and transmission, some specific genomic data compression approach is a need of today. In this paper, a hybrid approach-WBTC (Word Based Compression Technique) based on statistical and substitution model is proposed for genome compression. WBTC can support genomic data in raw forms as well as Fasta/Multi-fasta file formats. WBTC is a lossless genome compression algorithm in which searching is possible without full decompression. Experiments show that the proposed algorithm-WBTC outperforms in comparison to other state-of-the-art algorithms with respect to compression ratio, compression time, decompression time, compression memory and decompression memory.

中文翻译:

WBTC:一种有效存储基因组数据的新方法

随着高通量基因组测序技术的改进,每天都会产生大量的基因组数据。这些数据被用于许多应用中:序列比对,药物发现和个性化药物等。为了有效地处理基因组数据以进行存储,处理和传输,当今需要一些特定的基因组数据压缩方法。本文提出了一种基于统计和替换模型的混合方法-WBTC(基于单词的压缩技术),用于基因组压缩。WBTC可以支持原始格式以及Fasta / Multi-fasta文件格式的基因组数据。WBTC是一种无损基因组压缩算法,可以在不进行完全解压缩的情况下进行搜索。
更新日期:2020-06-13
down
wechat
bug