当前位置: X-MOL 学术Saudi J. Biol. Sci. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Information theoretic perspective on genome clustering
Saudi Journal of Biological Sciences ( IF 4.4 ) Pub Date : 2020-12-31 , DOI: 10.1016/j.sjbs.2020.12.039
Alaguraj Veluchamy , Preeti Mehta , K.V. Srividhya , Hirendra Vikram , M.K. Govind , Ramneek Gupta , Abdul Aziz Bin Dukhyil , Raed Abdullah Alharbi , Saleh Abdullah Aloyuni , Mohamed M. Hassan , S. Krishnaswamy

Shannon’s information theoretic perspective of communication helps one to understand the storage and processing of information in one-dimensional sequences. An information theoretic analysis of 937 available completely sequenced prokaryotic genomes and 238 eukaryotic chromosomes is presented. Information content (Id) values were used to cluster these chromosomes. Chargaff’s second parity rule i.e compositional self-complementarity, an empirical fact is observed in all the genomes, except for the proteobacteria Candidatus Hodgkinia cicadicola. High information content, arising out of biased base composition in all the 14 chromosomes of Plasmodium falciparum is found among two other genomes of prokaryotes viz. Buchnera aphidicola str. Cc (Cinara cedri) and Candidatus Carsonella ruddii PV. Despite size and compositional variations, both prokaryotic and eukaryotic genomes do not deviate significantly from an equiprobable and random situation. Eukaryotic chromosomes of an organism tend to have similar informational restraints as seen when a simple distance based method is used to cluster them. In eukaryotes, in certain cases, Id values are also similar for the two arms (p and q arm) of the chromosomes. The results of this current study confirm that the information content can provide insights into the clustering of genomes and the evolution of messaging strategies of the genomes. An efficient and robust Perl CGI standalone tool is created based on this information theory algorithm for the analysis of the whole genomes and is made available at https://github.com/AlagurajVeluchamy/InformationTheory.



中文翻译:

信息理论对基因组聚类的看法

香农的通信信息理论观点有助于人们理解一维序列中信息的存储和处理。信息理论分析了937个可用的已完全测序的原核基因组和238个真核染色体。信息含量(Id)值用于将这些染色体聚类。Chargaff的第二个均等规则,即成分自互补性,除了蛋白质细菌假丝酵母Candidatus Hodgkinia cicadicola之外,在所有基因组中都观察到了经验事实。在其他两个原核生物基因组中发现了高信息含量,这是由于恶性疟原虫的所有14条染色体中碱基组成的偏向引起的。鬼羽箭属aphidicola海峡。抄送(松大Cedri酒店),并暂定鲁氏卡氏菌PV。尽管大小和组成存在差异,但原核和真核基因组都没有明显偏离等概率和随机情况。当使用基于距离的简单方法对其进行聚类时,生物的真核染色体往往具有类似的信息约束。在真核生物中,在某些情况下,染色体的两个臂(p和q臂)的Id值也相似。这项当前研究的结果证实,信息内容可以提供有关基因组聚类和基因组消息传递策略演变的见解。基于此信息论算法创建了一种高效且强大的Perl CGI独立工具,用于分析整个基因组,并可从https://github.com/AlagurajVeluchamy/InformationTheory获得。

更新日期:2021-03-04
down
wechat
bug