Extraction of Long k-mers Using Spaced Seeds,IEEE/ACM Transactions on Computational Biology and Bioinformatics

当前位置： X-MOL 学术 › IEEE/ACM Trans. Comput. Biol. Bioinform. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Extraction of Long k-mers Using Spaced Seeds
IEEE/ACM Transactions on Computational Biology and Bioinformatics ( IF 3.6 ) Pub Date : 2021-09-16 , DOI: 10.1109/tcbb.2021.3113131
Miika Leinonen ₁ , Leena Salmela ₁

Affiliation

The extraction of kk-mers from reads is an important task in many bioinformatics applications, such as all DNA sequence analysis methods based on de Bruijn graphs. These methods tend to be more accurate when the used kk-mers are unique in the analyzed DNA, and thus the use of longer kk-mers is preferred. When the read lengths of short read sequencing technologies increase, the error rate will become the determining factor for the largest possible value of kk. Here we propose LoMeX which uses spaced seeds to extract long kk-mers accurately even in the presence of sequencing errors. Our experiments show that LoMeX can extract long kk-mers from current Illumina reads with a similar or higher recall than a standard kk-mer counting tool. Furthermore, our experiments on simulated data show that when the read length further increases enabling even longer kk-mers, the performance of standard kk-mer counters declines, whereas LoMeX still extracts long kk-mers successfully.

中文翻译：

使用间隔种子提取长 k 聚体

从reads中提取kk-mers是许多生物信息学应用中的一项重要任务，例如所有基于de Bruijn图的DNA序列分析方法。当所使用的 kk-mers 在分析的 DNA 中是唯一的时，这些方法往往更准确，因此优选使用较长的 kk-mers。当短读长测序技术的读长增加时，错误率将成为kk最大可能值的决定因素。在这里，我们提出 LoMeX，它使用间隔种子来准确提取长 kk-mers，即使存在测序错误。我们的实验表明，LoMeX 可以从当前 Illumina 读数中提取长 kk-mers，其召回率与标准 kk-mer 计数工具相似或更高。此外，我们对模拟数据的实验表明，当读长进一步增加以实现更长的 kk-mers 时，标准 kk-mer 计数器的性能会下降，而 LoMeX 仍然成功提取长 kk-mers。

更新日期：2021-09-16

点击分享查看原文

点击收藏

公开下载

阅读更多本刊最新论文