当前位置: X-MOL 学术Biol. Direct › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Use of designed sequences in protein structure recognition.
Biology Direct ( IF 5.5 ) Pub Date : 2018-05-09 , DOI: 10.1186/s13062-018-0209-6
Gayatri Kumar 1 , Richa Mudgal 1, 2 , Narayanaswamy Srinivasan 1 , Sankaran Sandhya 1
Affiliation  

BACKGROUND Knowledge of the protein structure is a pre-requisite for improved understanding of molecular function. The gap in the sequence-structure space has increased in the post-genomic era. Grouping related protein sequences into families can aid in narrowing the gap. In the Pfam database, structure description is provided for part or full-length proteins of 7726 families. For the remaining 52% of the families, information on 3-D structure is not yet available. We use the computationally designed sequences that are intermediately related to two protein domain families, which are already known to share the same fold. These strategically designed sequences enable detection of distant relationships and here, we have employed them for the purpose of structure recognition of protein families of yet unknown structure. RESULTS We first measured the success rate of our approach using a dataset of protein families of known fold and achieved a success rate of 88%. Next, for 1392 families of yet unknown structure, we made structural assignments for part/full length of the proteins. Fold association for 423 domains of unknown function (DUFs) are provided as a step towards functional annotation. CONCLUSION The results indicate that knowledge-based filling of gaps in protein sequence space is a lucrative approach for structure recognition. Such sequences assist in traversal through protein sequence space and effectively function as 'linkers', where natural linkers between distant proteins are unavailable. REVIEWERS This article was reviewed by Oliviero Carugo, Christine Orengo and Srikrishna Subramanian.

中文翻译:

设计序列在蛋白质结构识别中的用途。

背景技术关于蛋白质结构的知识是改善对分子功能的理解的先决条件。在后基因组时代,序列结构空间中的缺口增加了。将相关的蛋白质序列分为家族可以帮助缩小缺口。在Pfam数据库中,提供了7726家族的部分或全长蛋白质的结构描述。对于其余52%的家庭,尚无法获得有关3-D结构的信息。我们使用经过计算设计的序列,这些序列与两个蛋白质结构域家族中间相关,而这两个家族已知具有相同的折叠。这些策略性设计的序列能够检测远距离的关系,在这里,我们将其用于结构识别未知结构蛋白家族的目的。结果我们首先使用已知倍数的蛋白质家族数据集测量了我们方法的成功率,并获得了88%的成功率。接下来,对于1392个结构未知的家族,我们对蛋白质的部分/全长进行了结构分配。提供了423个未知功能域(DUF)的折叠关联,作为迈向功能注释的一步。结论结果表明,基于知识的蛋白质序列空间中的缺口的填充是一种有利的结构识别方法。这样的序列有助于遍历蛋白质序列空间并有效地充当“接头”,而远处的蛋白质之间没有天然的接头。审阅者本文由Oliviero Carugo,Christine Orengo和Srikrishna Subramanian审阅。
更新日期:2020-04-22
down
wechat
bug