当前位置: X-MOL 学术Theory Biosci. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Genes, information and sense: complexity and knowledge retrieval.
Theory in Biosciences ( IF 1.3 ) Pub Date : 2008-04-29 , DOI: 10.1007/s12064-008-0032-1
Michael G Sadovsky 1 , Julia A Putintseva , Alexander S Shchepanovsky
Affiliation  

Information capacity of nucleotide sequences measures the unexpectedness of a continuation of a given string of nucleotides, thus having a sound relation to a variety of biological issues. A continuation is defined in a way maximizing the entropy of the ensemble of such continuations. The capacity is defined as a mutual entropy of real frequency dictionary of a sequence with respect to the one bearing the most expected continuations; it does not depend on the length of strings contained in a dictionary. Various genomes exhibit a multi-minima pattern of the dependence of information capacity on the string length, thus reflecting an order within a sequence. The strings with significant deviation of an expected frequency from the real one are the words of increased information value. Such words exhibit a non-random distribution alongside a sequence, thus making it possible to retrieve the correlation between a structure, and a function encoded within a sequence.

中文翻译:

基因、信息和意义:复杂性和知识检索。

核苷酸序列的信息容量衡量给定核苷酸串的连续性的意外性,因此与各种生物学问题具有良好的关系。延续以最大化此类延续的集合的熵的方式定义。容量被定义为序列的实频字典相对于最期望的连续序列的互熵;它不依赖于字典中包含的字符串的长度。各种基因组表现出信息容量对字符串长度的依赖性的多极小值模式,从而反映了序列内的顺序。预期频率与真实频率有显着偏差的字符串是具有增加信息价值的词。这些词在序列旁边表现出非随机分布,
更新日期:2019-11-01
down
wechat
bug