当前位置: X-MOL 学术Algorithms Mol. Biol. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Using the longest run subsequence problem within homology-based scaffolding
Algorithms for Molecular Biology ( IF 1 ) Pub Date : 2021-06-28 , DOI: 10.1186/s13015-021-00191-8
Sven Schrinner 1 , Manish Goel 2, 3 , Michael Wulfert 4 , Philipp Spohr 1 , Korbinian Schneeberger 2, 3, 5 , Gunnar W Klau 1, 5
Affiliation  

Genome assembly is one of the most important problems in computational genomics. Here, we suggest addressing an issue that arises in homology-based scaffolding, that is, when linking and ordering contigs to obtain larger pseudo-chromosomes by means of a second incomplete assembly of a related species. The idea is to use alignments of binned regions in one contig to find the most homologous contig in the other assembly. We show that ordering the contigs of the other assembly can be expressed by a new string problem, the longest run subsequence problem (LRS). We show that LRS is NP-hard and present reduction rules and two algorithmic approaches that, together, are able to solve large instances of LRS to provable optimality. All data used in the experiments as well as our source code are freely available. We demonstrate its usefulness within an existing larger scaffolding approach by solving realistic instances resulting from partial Arabidopsis thaliana assemblies in short computation time.

中文翻译:

在基于同源性的脚手架中使用最长运行子序列问题

基因组组装是计算基因组学中最重要的问题之一。在这里,我们建议解决基于同源性的支架中出现的一个问题,即当连接和排序重叠群时,通过相关物种的第二次不完全组装来获得更大的假染色体。这个想法是使用一个 contig 中分箱区域的比对来找到另一个装配中最同源的 contig。我们表明,对另一个程序集的重叠群进行排序可以通过一个新的字符串问题来表达,即最长运行子序列问题 (LRS)。我们展示了 LRS 是 NP-hard 并提出了归约规则和两种算法方法,它们一起能够解决 LRS 的大型实例以证明最优性。实验中使用的所有数据以及我们的源代码都是免费提供的。
更新日期:2021-06-29
down
wechat
bug