当前位置: X-MOL 学术Gigascience › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Efficient DNA sequence compression with neural networks
GigaScience ( IF 11.8 ) Pub Date : 2020-11-11 , DOI: 10.1093/gigascience/giaa119
Milton Silva 1, 2 , Diogo Pratas 1, 2, 3 , Armando J Pinho 1, 2
Affiliation  

The increasing production of genomic data has led to an intensified need for models that can cope efficiently with the lossless compression of DNA sequences. Important applications include long-term storage and compression-based data analysis. In the literature, only a few recent articles propose the use of neural networks for DNA sequence compression. However, they fall short when compared with specific DNA compression tools, such as GeCo2. This limitation is due to the absence of models specifically designed for DNA sequences. In this work, we combine the power of neural networks with specific DNA models. For this purpose, we created GeCo3, a new genomic sequence compressor that uses neural networks for mixing multiple context and substitution-tolerant context models.

中文翻译:


利用神经网络进行高效 DNA 序列压缩



基因组数据产量的不断增加导致对能够有效应对 DNA 序列无损压缩的模型的需求不断增加。重要的应用包括长期存储和基于压缩的数据分析。在文献中,只有少数最近的文章提出使用神经网络进行 DNA 序列压缩。然而,与特定的 DNA 压缩工具(例如 GeCo2)相比,它们存在不足。这种限制是由于缺乏专门为 DNA 序列设计的模型。在这项工作中,我们将神经网络的力量与特定的 DNA 模型结合起来。为此,我们创建了 GeCo3,一种新的基因组序列压缩器,它使用神经网络来混合多个上下文和容忍替换的上下文模型。
更新日期:2020-11-12
down
wechat
bug