当前位置: X-MOL 学术Sci. Rep. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
FQSqueezer: k-mer-based compression of sequencing data.
Scientific Reports ( IF 4.6 ) Pub Date : 2020-01-17 , DOI: 10.1038/s41598-020-57452-6
Sebastian Deorowicz 1
Affiliation  

The amount of data produced by modern sequencing instruments that needs to be stored is huge. Therefore it is not surprising that a lot of work has been done in the field of specialized data compression of FASTQ files. The existing algorithms are, however, still imperfect and the best tools produce quite large archives. We present FQSqueezer, a novel compression algorithm for sequencing data able to process single- and paired-end reads of variable lengths. It is based on the ideas from the famous prediction by partial matching and dynamic Markov coder algorithms known from the general-purpose-compressors world. The compression ratios are often tens of percent better than offered by the state-of-the-art tools. The drawbacks of the proposed method are large memory and time requirements.



中文翻译:

FQSqueezer:基于k-mer的测序数据压缩。

现代测序仪器产生的需要存储的数据量巨大。因此,毫不奇怪,在FASTQ文件的专用数据压缩领域已经完成了许多工作。但是,现有的算法仍然不完善,最好的工具会产生相当大的档案。我们介绍了FQSqueezer,这是一种新颖的压缩算法,用于对数据进行排序,能够处理可变长度的单端和双端读取。它基于通用压缩机世界中已知的部分匹配和动态Markov编码器算法的著名预测思想。压缩率通常比最新工具提供的压缩率高百分之几十。所提出的方法的缺点是大的存储器和时间要求。

更新日期:2020-01-17
down
wechat
bug