当前位置:
X-MOL 学术
›
Algorithms Mol. Biol.
›
论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
K ohdista : an efficient method to index and query possible Rmap alignments
Algorithms for Molecular Biology ( IF 1 ) Pub Date : 2019-12-12 , DOI: 10.1186/s13015-019-0160-9 Martin D Muggli 1 , Simon J Puglisi 2 , Christina Boucher 3
Algorithms for Molecular Biology ( IF 1 ) Pub Date : 2019-12-12 , DOI: 10.1186/s13015-019-0160-9 Martin D Muggli 1 , Simon J Puglisi 2 , Christina Boucher 3
Affiliation
Background
Genome-wide optical maps are ordered high-resolution restriction maps that give the position of occurrence of restriction cut sites corresponding to one or more restriction enzymes. These genome-wide optical maps are assembled using an overlap-layout-consensus approach using raw optical map data, which are referred to as Rmaps. Due to the high error-rate of Rmap data, finding the overlap between Rmaps remains challenging.
Results
We present K
ohdista
, which is an index-based algorithm for finding pairwise alignments between single molecule maps (
Rmaps
). The novelty of our approach is the formulation of the alignment problem as automaton path matching, and the application of modern index-based data structures. In particular, we combine the use of the Generalized Compressed Suffix Array (GCSA) index with the wavelet tree in order to build K
ohdista
. We validate K
ohdista
on simulated
E. coli
data, showing the approach successfully finds alignments between Rmaps simulated from overlapping genomic regions.
Conclusion
we demonstrate K
ohdista
is the only method that is capable of finding a significant number of high quality pairwise Rmap alignments for large eukaryote organisms in reasonable time.
中文翻译:
K ohdista:索引和查询可能的 Rmap 对齐的有效方法
背景
全基因组光学图谱是有序的高分辨率限制图谱,给出了对应于一种或多种限制酶的限制性切割位点的出现位置。这些全基因组光学图谱是使用重叠布局共识方法使用原始光学图谱数据(称为 Rmaps)组装而成的。由于 Rmap 数据的高错误率,找到 Rmap 之间的重叠仍然具有挑战性。
结果
我们提出 K
奥迪斯塔
,这是一种基于索引的算法,用于查找单分子图 (
Rmaps
) 之间的成对比对。我们方法的新颖之处在于将对齐问题表述为自动机路径匹配,以及现代基于索引的数据结构的应用。特别是,我们结合使用广义压缩后缀数组 (GCSA) 索引和小波树来构建 K
奥迪斯塔
. 我们验证 K
奥迪斯塔
在模拟
的
大肠杆菌
数据上,显示该方法成功地找到了从重叠基因组区域模拟的 Rmap 之间的比对。
结论
我们证明了 K
奥迪斯塔
是唯一能够在合理时间内为大型真核生物找到大量高质量成对 Rmap 比对的方法。
更新日期:2019-12-12
Genome-wide optical maps are ordered high-resolution restriction maps that give the position of occurrence of restriction cut sites corresponding to one or more restriction enzymes. These genome-wide optical maps are assembled using an overlap-layout-consensus approach using raw optical map data, which are referred to as Rmaps. Due to the high error-rate of Rmap data, finding the overlap between Rmaps remains challenging.
Results
We present K
Conclusion
we demonstrate K
中文翻译:
K ohdista:索引和查询可能的 Rmap 对齐的有效方法
背景
全基因组光学图谱是有序的高分辨率限制图谱,给出了对应于一种或多种限制酶的限制性切割位点的出现位置。这些全基因组光学图谱是使用重叠布局共识方法使用原始光学图谱数据(称为 Rmaps)组装而成的。由于 Rmap 数据的高错误率,找到 Rmap 之间的重叠仍然具有挑战性。
结果
我们提出 K
结论
我们证明了 K