当前位置: X-MOL 学术J. Math. Biol. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Structure of the space of taboo-free sequences.
Journal of Mathematical Biology ( IF 1.9 ) Pub Date : 2020-09-17 , DOI: 10.1007/s00285-020-01535-5
Cassius Manuel 1 , Arndt von Haeseler 1, 2
Affiliation  

Models of sequence evolution typically assume that all sequences are possible. However, restriction enzymes that cut DNA at specific recognition sites provide an example where carrying a recognition site can be lethal. Motivated by this observation, we studied the set of strings over a finite alphabet with taboos, that is, with prohibited substrings. The taboo-set is referred to as \(\mathbb {T}\) and any allowed string as a taboo-free string. We consider the so-called Hamming graph \(\varGamma _n(\mathbb {T})\), whose vertices are taboo-free strings of length n and whose edges connect two taboo-free strings if their Hamming distance equals one. Any (random) walk on this graph describes the evolution of a DNA sequence that avoids taboos. We describe the construction of the vertex set of \(\varGamma _n(\mathbb {T})\). Then we state conditions under which \(\varGamma _n(\mathbb {T})\) and its suffix subgraphs are connected. Moreover, we provide an algorithm that determines if all these graphs are connected for an arbitrary \(\mathbb {T}\). As an application of the algorithm, we show that about \(87\%\) of bacteria listed in REBASE have a taboo-set that induces connected taboo-free Hamming graphs, because they have less than four type II restriction enzymes. On the other hand, four properly chosen taboos are enough to disconnect one suffix subgraph, and consequently connectivity of taboo-free Hamming graphs could change depending on the composition of restriction sites.



中文翻译:

无禁忌序列空间的结构。

序列进化模型通常假设所有序列都是可能的。然而,在特定识别位点切割 DNA 的限制酶提供了一个例子,其中携带识别位点可能是致命的。受此观察的启发,我们研究了具有禁忌的有限字母表上的字符串集,即禁止的子字符串。禁忌集被称为\(\mathbb {T}\),任何允许的字符串被称为无禁忌字符串。我们考虑所谓的汉明图\(\varGamma _n(\mathbb {T})\),其顶点是长度为n的无禁忌字符串如果它们的汉明距离等于 1,则它们的边连接两个无禁忌字符串。此图上的任何(随机)游走都描述了避免禁忌的 DNA 序列的进化。我们描述了\(\varGamma _n(\mathbb {T})\)的顶点集的构造。然后我们陈述\(\varGamma _n(\mathbb {T})\)及其后缀子图连接的条件。此外,我们提供了一种算法,用于确定所有这些图是否连接到任意\(\mathbb {T}\)。作为算法的一个应用,我们证明了关于\(87\%\)REBASE 中列出的细菌有一个禁忌集,可以诱导连接的无禁忌汉明图,因为它们的 II 型限制酶少于四种。另一方面,四个正确选择的禁忌足以断开一个后缀子图,因此无禁忌汉明图的连通性可能会根据限制站点的组成而改变。

更新日期:2020-09-18
down
wechat
bug