当前位置: X-MOL 学术Mol. Omics › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Genome-wide survey of remote homologues for protein domain superfamilies of known structure reveals unequal distribution across structural classes
Molecular Omics ( IF 3.0 ) Pub Date : 2018-06-27 , DOI: 10.1039/c8mo00008e
Meenakshi S. Iyer 1, 2, 3 , Adwait G. Joshi 1, 2, 3, 4, 5 , Ramanathan Sowdhamini 1, 2, 3
Affiliation  

Domains are the basic building blocks of proteins which can combine to give rise to different domain architectures. Annotation of domains in a sequence is the first step towards understanding the biological function. Since there are a limited number of folds and evolutionarily related proteins have a similar structure, function can be inferred through remote homology. Computational sequence searches were performed for remote homologues on genomes of around ∼160 000 different organisms, starting from nearly 11 000 superfamily queries of known structure. Case studies revealed that most of the associated domains are involved in the same biological process. Using all the proteins predicted to have at least one structural domain, a coverage of 61% of Pfam families was achieved which is higher than the existing methods (43.36% by SIFTS). Taxonomic analysis of the proteins revealed 493 superfamilies in all the major kingdoms of life and a few lateral gene transfers between viruses and cellular organisms. The distribution of remote homologues across different classes, folds and superfamilies was studied and reveals that sequences are unequally distributed across structural classes. Finally, domain architectures were computed for the homologues and these data were compiled for each superfamily and organism.

中文翻译:

全基因组范围内的远程同源物的已知结构的蛋白质域超家族的调查表明,结构类之间分布不均

域是蛋白质的基本组成部分,可以结合起来产生不同的域结构。序列中的域注释是了解生物学功能的第一步。由于折叠的数量有限,并且进化相关的蛋白质具有相似的结构,因此可以通过远程同源性推断功能。从大约11000个已知结构的超家族查询开始,对大约160000种不同生物的基因组上的远程同源物进行了计算序列搜索。案例研究表明,大多数相关域都涉及相同的生物学过程。使用所有预计具有至少一个结构域的蛋白质,Pfam家族的覆盖率达到61%,高于现有方法(SIFTS的覆盖率为43.36%)。蛋白质的分类学分析揭示了生命的所有主要王国中的493个超家族,以及病毒和细胞生物之间的一些侧向基因转移。研究了远程同源物在不同类别,折叠和超家族之间的分布,揭示了序列在结构类别之间的分布不均。最后,计算同源物的结构域,并为每个超家族和生物汇编这些数据。
更新日期:2018-12-01
down
wechat
bug