当前位置: X-MOL 学术Interdiscip. Sci. Comput. Life Sci. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
AC: A Compression Tool for Amino Acid Sequences.
Interdisciplinary Sciences: Computational Life Sciences ( IF 3.9 ) Pub Date : 2019-02-06 , DOI: 10.1007/s12539-019-00322-1
Morteza Hosseini 1 , Diogo Pratas 1 , Armando J Pinho 1
Affiliation  

Advancement of protein sequencing technologies has led to the production of a huge volume of data that needs to be stored and transmitted. This challenge can be tackled by compression. In this paper, we propose AC, a state-of-the-art method for lossless compression of amino acid sequences. The proposed method works based on the cooperation between finite-context models and substitutional tolerant Markov models. Compared to several general-purpose and specific-purpose protein compressors, AC provides the best bit-rates. This method can also compress the sequences nine times faster than its competitor, paq8l. In addition, employing AC, we analyze the compressibility of a large number of sequences from different domains. The results show that viruses are the most difficult sequences to be compressed. Archaea and bacteria are the second most difficult ones, and eukaryota are the easiest sequences to be compressed.

中文翻译:

AC:氨基酸序列的压缩工具。

蛋白质测序技术的进步已导致产生大量需要存储和传输的数据。可以通过压缩解决这一挑战。在本文中,我们提出了一种无损压缩氨基酸序列的最新方法AC。所提出的方法基于有限上下文模型和可替代的容忍马尔可夫模型之间的协同工作。与几种通用和专用蛋白质压缩器相比,AC提供了最佳的比特率。该方法比其竞争对手paq81压缩序列的速度快九倍。此外,使用AC,我们分析了来自不同域的大量序列的可压缩性。结果表明,病毒是最难压缩的序列。
更新日期:2019-11-01
down
wechat
bug