当前位置: X-MOL 学术J. Comput. Biol. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Identification of Words in Biological Sequences Under the Semi-Markov Hypothesis.
Journal of Computational Biology ( IF 1.4 ) Pub Date : 2020-05-07 , DOI: 10.1089/cmb.2019.0253
Brenda Ivette Garcia-Maya 1 , Nikolaos Limnios 1
Affiliation  

Identifying a word (pattern) in a long sequence of letters is not an easy task. To achieve this objective, several models have been proposed under the assumption that the sequence of letters is described by a Markov chain. The Markovian hypothesis imposes restrictions on the distribution of the sojourn time in a state, which has geometric distribution in a discrete process. This is the main drawback when applying Markov chains to real problems. By contrast, semi-Markov processes are generalized. In semi-Markov processes, the sojourn time in a state can be governed by any distribution function. The goal of this article is to compute the first hitting time (position) of a word (pattern) in a semi-Markov sequence. To achieve this objective, we use the auxiliary prefix and backward chain. To give an example of the applications of the proposed model, the model is tested in a bacteriophage DNA sequence that is lacking the enzyme SmaI. We compute the probability that a word occurs for the first time after n nucleotides in a DNA sequence. The corresponding probability distribution, the mean waiting position, the variance, and rate of the occurrence of the word are obtained.

中文翻译:

半马尔可夫假设下生物序列中单词的识别。

识别长字母序列中的单词(模式)并非易事。为了实现该目的,在假设字母序列由马尔可夫链描述的假设下,已经提出了几种模型。马尔可夫假设对状态下的停留时间分布施加了限制,该状态在离散过程中具有几何分布。这是将马尔可夫链应用于实际问题时的主要缺点。相反,半马尔可夫过程是广义的。在半马尔可夫过程中,状态下的停留时间可以由任何分布函数控制。本文的目的是计算半马尔可夫序列中单词(模式)的第一个击中时间(位置)。为了达到这个目的,我们使用辅助前缀和后向链。为了举例说明该模型的应用,SmaI。我们计算一个单词在DNA序列中的n个核苷酸之后首次出现的可能性。获得相应的概率分布,平均等待位置,方差和单词出现的速率。
更新日期:2020-05-07
down
wechat
bug