Genes, information and sense: complexity and knowledge retrieval

Sadovsky, Michael G.; Putintseva, Julia A.; Shchepanovsky, Alexander S.

doi:10.1007/s12064-008-0032-1

Genes, information and sense: complexity and knowledge retrieval

Original Paper
Published: 29 April 2008

Volume 127, pages 69–78, (2008)
Cite this article

Theory in Biosciences Aims and scope Submit manuscript

Michael G. Sadovsky¹,
Julia A. Putintseva² &
Alexander S. Shchepanovsky¹

195 Accesses
12 Citations
Explore all metrics

Abstract

Information capacity of nucleotide sequences measures the unexpectedness of a continuation of a given string of nucleotides, thus having a sound relation to a variety of biological issues. A continuation is defined in a way maximizing the entropy of the ensemble of such continuations. The capacity is defined as a mutual entropy of real frequency dictionary of a sequence with respect to the one bearing the most expected continuations; it does not depend on the length of strings contained in a dictionary. Various genomes exhibit a multi-minima pattern of the dependence of information capacity on the string length, thus reflecting an order within a sequence. The strings with significant deviation of an expected frequency from the real one are the words of increased information value. Such words exhibit a non-random distribution alongside a sequence, thus making it possible to retrieve the correlation between a structure, and a function encoded within a sequence.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Information Theory in Genome Analysis

Bases are Not Letters: On the Analogy between the Genetic Code and Natural Language by Sequence Analysis

Article 28 March 2019

Dan Faltýnek, Vladimír Matlach & Ľudmila Lacková

Self-similarity and the maximum entropy principle in the genetic code

Article 04 July 2023

Subhash Kak

Notes

The theory and methodology described below is applicable to a sequence from an arbitrary (finite) alphabet ℵ, say, for amino acid sequences.
An equality of these two sums stands behind the connection of a sequence into a ring.
Strictly speaking, information capacity is defined for a frequency dictionary, not for a sequence; we shall not make the difference between them, unless a mispresentation occurs.

References

Bugaenko NN, Gorban AN, Sadovsky MG (1996) Towards the information content of nucleotide sequences. Mol Biol Mosc 30:529
CAS Google Scholar
Bugaenko NN, Gorban AN, Sadovsky MG (1998) Maximum entropy method in analysis of genetic text and measurement of its information content. Open Syst Inf Dyn 5:265
Article Google Scholar
Carbone A, Zinovyev A, Kepes F (2003) Codon Adaptation Index as a measure of dominating codon bias. Bioinformatics 19:2005
Article CAS PubMed Google Scholar
Durand B, Zvonkin A (2004) L’héritage de Kolmogorov en Mathématiques, Berlin, pp 269–287
Gorban AN, Popova TG, Sadovsky MG (1994) Redundancy of genetic texts and mosaic structure of genomes. Mol Biology (Mosc) 28:313
CAS Google Scholar
Gorban AN, Karlin IV (2005) Invariant manifolds for physical and chemical kinetics. Lect. Notes Phys, 660. Springer, Berlin
Nakamura PM (2000) Codon usage: mutational bias, translational selection and mutational biases. Nucleic Acids Res 19:8023
Google Scholar
Popova TG, Sadovsky MG (1995) Introns differ from exons in their redundancy. Russ J Genet 31:1365
CAS Google Scholar
Rui H, Bin W (2001) Statistically significant strings are related to regulatory elements in the promoter regions of Saccharomyces cerevisiae. Physica A 290:464
Google Scholar
Sadovsky MG (2002a) Information capacity of symbol sequences. Open Syst Inf Dyn 9:37
Article Google Scholar
Sadovsky MG (2002b) Towards the information capacity of symbol sequences. Electron Inform Control 1:82
Google Scholar
Sadovsky MG (2002c) Towards the redundancy of viral and prokaryotic genomes. Russ J Genet 38:575
Article CAS Google Scholar
Sadovsky MG (2003) Comparison of real frequencies of strings vs. the expected ones reveals the information capacity of macromoleculae. J Biol Phys 29:23
Article CAS Google Scholar
Sadovsky MG (2005) Information capacity of biological macromoleculae reloaded ArXiv q-bio.GN 0501011 v1
Sadovsky MG (2006) Information capacity of nucleotide sequences and its applications. Bull Math Biol 68:156
Article Google Scholar
Sadovsky MG, Putintzeva YA (2007) Codon usage bias measured through entropy approach, arXiv:0706.2077v1, 14 June 2007
Shannon CE, Weaver W (1949) The mathematical theory of communication. University of Illinois Press, Urbana
Google Scholar
Sharp PM, Stenico M, Peden JF, Lloyd AT (1993) Codon usage: mutational bias, translational selection and mutational biases. Nucleic Acids Res 15:8023
Google Scholar
Zubkov AM, Mikhailov VG (1974) Limit distributions of random variables associated with long duplications in a sequence of independent trials. Probab Theory Appl 19:173
Google Scholar
Zvonkin AK, Levin L (1970) The complexity of finite objects and development of the concepts of information and randomness by means of the theory of algorithms. Russ Math Surv 25(6):83
Article Google Scholar

Download references

Acknowledgments

We are thankful to Prof. Alexander N. Gorban from Leicester University, for valuable discussions and inspiring ideas, and to Dr. Tatyana G. Popova from the Institute of Computational Modelling of RAS for stimulating interest in this work.

Author information

Authors and Affiliations

Institute of Computational Modelling of RAS, Akademgorodok, 660036, Krasnoyarsk, Russia
Michael G. Sadovsky & Alexander S. Shchepanovsky
Siberian Federal University, Svobodny prosp., 79, 660041, Krasnoyarsk, Russia
Julia A. Putintseva

Authors

Michael G. Sadovsky
View author publications
You can also search for this author in PubMed Google Scholar
Julia A. Putintseva
View author publications
You can also search for this author in PubMed Google Scholar
Alexander S. Shchepanovsky
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Michael G. Sadovsky.

Additional information

The results present here were partially obtained due to the support from Krasnoyarsk Science Foundation.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Sadovsky, M.G., Putintseva, J.A. & Shchepanovsky, A.S. Genes, information and sense: complexity and knowledge retrieval. Theory Biosci. 127, 69–78 (2008). https://doi.org/10.1007/s12064-008-0032-1

Download citation

Received: 20 September 2007
Accepted: 26 March 2008
Published: 29 April 2008
Issue Date: May 2008
DOI: https://doi.org/10.1007/s12064-008-0032-1

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Genes, information and sense: complexity and knowledge retrieval

Abstract

Access this article

Similar content being viewed by others

Information Theory in Genome Analysis

Bases are Not Letters: On the Analogy between the Genetic Code and Natural Language by Sequence Analysis

Self-similarity and the maximum entropy principle in the genetic code

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Genes, information and sense: complexity and knowledge retrieval

Abstract

Access this article

Similar content being viewed by others

Information Theory in Genome Analysis

Bases are Not Letters: On the Analogy between the Genetic Code and Natural Language by Sequence Analysis

Self-similarity and the maximum entropy principle in the genetic code

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation