当前位置: X-MOL 学术Algorithms Mol. Biol. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Context-aware seeds for read mapping.
Algorithms for Molecular Biology ( IF 1 ) Pub Date : 2020-05-23 , DOI: 10.1186/s13015-020-00172-3
Hongyi Xin 1, 2 , Mingfu Shao 3 , Carl Kingsford 4
Affiliation  

Most modern seed-and-extend NGS read mappers employ a seeding scheme that requires extracting t non-overlapping seeds in each read in order to find all valid mappings under an edit distance threshold of t. As t grows, this seeding scheme forces mappers to use more and shorter seeds, which increases the seed hits (seed frequencies) and therefore reduces the efficiency of mappers. We propose a novel seeding framework, context-aware seeds (CAS). CAS guarantees finding all valid mappings but uses fewer (and longer) seeds, which reduces seed frequencies and increases efficiency of mappers. CAS achieves this improvement by attaching a confidence radius to each seed in the reference. We prove that all valid mappings can be found if the sum of confidence radii of seeds are greater than t. CAS generalizes the existing pigeonhole-principle-based seeding scheme in which this confidence radius is implicitly always 1. Moreover, we design an efficient algorithm that constructs the confidence radius database in linear time. We experiment CAS with E. coli genome and show that CAS significantly reduces seed frequencies when compared with the state-of-the-art pigeonhole-principle-based seeding algorithm, the Optimal Seed Solver. https://github.com/Kingsford-Group/CAS_code

中文翻译:

用于读取映射的上下文感知种子。

大多数现代种子和扩展 NGS 读取映射器采用播种方案,该方案需要在每次读取中提取 t 个非重叠种子,以便在编辑距离阈值 t 下找到所有有效映射。随着 t 的增长,这种播种方案迫使映射器使用更多和更短的种子,这增加了种子命中(种子频率),从而降低了映射器的效率。我们提出了一种新颖的种子框架,即上下文感知种子(CAS)。CAS 保证找到所有有效的映射,但使用更少(和更长)的种子,这降低了种子频率并提高了映射器的效率。CAS 通过将置信半径附加到参考中的每个种子来实现这一改进。我们证明,如果种子的置信半径之和大于 t,则可以找到所有有效的映射。CAS 推广了现有的基于鸽巢原理的播种方案,其中该置信半径隐式始终为 1。此外,我们设计了一种在线性时间内构建置信半径数据库的有效算法。我们用大肠杆菌基因组对 CAS 进行了实验,结果表明,与最先进的基于鸽巢原理的播种算法 Optimal Seed Solver 相比,CAS 显着降低了种子频率。https://github.com/Kingsford-Group/CAS_code
更新日期:2020-05-23
down
wechat
bug