当前位置: X-MOL 学术IEEE/ACM Trans. Comput. Biol. Bioinform. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Extraction of Long k-mers Using Spaced Seeds
IEEE/ACM Transactions on Computational Biology and Bioinformatics ( IF 4.5 ) Pub Date : 2021-09-16 , DOI: 10.1109/tcbb.2021.3113131
Miika Leinonen , Leena Salmela

The extraction of $k$ -mers from reads is an important task in many bioinformatics applications, such as all DNA sequence analysis methods based on de Bruijn graphs. These methods tend to be more accurate when the used $k$ -mers are unique in the analyzed DNA, and thus the use of longer $k$ -mers is preferred. When the read lengths of short read sequencing technologies increase, the error rate will become the determining factor for the largest possible value of $k$ . Here we propose LoMeX which uses spaced seeds to extract long $k$ -mers accurately even in the presence of sequencing errors. Our experiments show that LoMeX can extract long $k$ -mers from current Illumina reads with a similar or higher recall than a standard $k$ -mer counting tool. Furthermore, our experiments on simulated data show that when the read length further increases enabling even longer $k$ -mers, the performance of standard $k$ -mer counters declines, whereas LoMeX still extracts long $k$ -mers successfully.

中文翻译:

使用间隔种子提取长 k 聚体

的提取$k$ -mers from reads 是许多生物信息学应用中的一项重要任务,例如所有基于 de Bruijn 图的 DNA 序列分析方法。这些方法在使用时往往更准确$k$ -mers 在分析的 DNA 中是独一无二的,因此使用更长的$k$ -mers 是首选。当短读长测序技术的读长增加时,错误率将成为最大可能值的决定因素$k$ . 在这里,我们提出了LoMeX,它使用间隔种子来提取长$k$ -即使在存在测序错误的情况下也能准确地进行聚体。我们的实验表明LoMeX可以提取长$k$ -来自当前 Illumina reads 的 mers 具有与标准相似或更高的召回率$k$ -mer 计数工具。此外,我们对模拟数据的实验表明,当读取长度进一步增加时,可以实现更长的$k$ -mers,标准的性能$k$ -mer 计数器下降,而LoMeX仍然提取多头$k$ -mers成功。
更新日期:2021-09-16
down
wechat
bug