Communications in Statistics - Simulation and Computation ( IF 0.9 ) Pub Date : 2020-06-30 , DOI: 10.1080/03610918.2020.1766500 Ke-Ning Sheng 1 , Joseph I. Naus 1
Abstract
We derive a procedure to obtain the exact probability that a specific pattern of letters occurs in a longer random sequence of letters. The procedure is generalized to find the exact probability of a fixed (specific) single pattern, and a union or intersection of multiple fixed (specific) patterns within a random sequence perfectly for any distributions of a cell in the random sequence, and can handle patterns with uncertain letters (including missing, blank, unclear, ambiguous, transposition, etc.). The procedure also finds the probability that a pattern that is randomly picked will appear in a separate longer random sequence of letters. These methods are of particular applicability in genetic sequence analysis, diagnostics, anthropology, clinical medicine, data mining, computational molecular biology, and pattern analysis and recognition.
中文翻译:
随机序列中出现固定模式的准确概率
摘要
我们推导出一个程序来获得特定字母模式出现在较长的随机字母序列中的确切概率。该过程被概括为找到固定(特定)单一模式的确切概率,以及随机序列中多个固定(特定)模式的联合或交集,完美地适用于随机序列中细胞的任何分布,并且可以处理模式带有不确定的字母(包括缺失、空白、不清楚、模棱两可、换位等)。该过程还发现随机挑选的图案出现在单独的较长随机字母序列中的概率。这些方法特别适用于基因序列分析、诊断学、人类学、临床医学、数据挖掘、计算分子生物学以及模式分析和识别。