Use of designed sequences in protein structure recognition.,Biology Direct

当前位置： X-MOL 学术 › Biol. Direct › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Use of designed sequences in protein structure recognition.
Biology Direct ( IF 5.7 ) Pub Date : 2018-05-09 , DOI: 10.1186/s13062-018-0209-6
Gayatri Kumar ₁ , Richa Mudgal _{1,

2} , Narayanaswamy Srinivasan ₁ , Sankaran Sandhya ₁

Affiliation

BACKGROUND Knowledge of the protein structure is a pre-requisite for improved understanding of molecular function. The gap in the sequence-structure space has increased in the post-genomic era. Grouping related protein sequences into families can aid in narrowing the gap. In the Pfam database, structure description is provided for part or full-length proteins of 7726 families. For the remaining 52% of the families, information on 3-D structure is not yet available. We use the computationally designed sequences that are intermediately related to two protein domain families, which are already known to share the same fold. These strategically designed sequences enable detection of distant relationships and here, we have employed them for the purpose of structure recognition of protein families of yet unknown structure. RESULTS We first measured the success rate of our approach using a dataset of protein families of known fold and achieved a success rate of 88%. Next, for 1392 families of yet unknown structure, we made structural assignments for part/full length of the proteins. Fold association for 423 domains of unknown function (DUFs) are provided as a step towards functional annotation. CONCLUSION The results indicate that knowledge-based filling of gaps in protein sequence space is a lucrative approach for structure recognition. Such sequences assist in traversal through protein sequence space and effectively function as 'linkers', where natural linkers between distant proteins are unavailable. REVIEWERS This article was reviewed by Oliviero Carugo, Christine Orengo and Srikrishna Subramanian.

中文翻译：

使用设计的序列进行蛋白质结构识别。

背景技术蛋白质结构的知识是提高对分子功能的理解的先决条件。在后基因组时代，序列结构空间的差距加大了。将相关蛋白质序列分组可以帮助缩小差距。在 Pfam 数据库中，提供了 7726 个家族的部分或全长蛋白质的结构描述。对于其余 52% 的家庭，尚无法获得有关 3D 结构的信息。我们使用与两个蛋白质结构域家族中间相关的计算设计序列，已知这两个蛋白质结构域家族共享相同的折叠。这些策略性设计的序列能够检测远距离关系，在这里，我们将它们用于结构未知的蛋白质家族的结构识别。结果我们首先使用已知折叠的蛋白质家族数据集测量了我们方法的成功率，并取得了 88% 的成功率。接下来，对于 1392 个结构未知的家族，我们对蛋白质的部分/全长进行了结构分配。提供了 423 个未知功能域 (DUF) 的折叠关联，作为功能注释的一步。结论结果表明，基于知识的蛋白质序列空间空白填充是一种利润丰厚的结构识别方法。此类序列有助于穿越蛋白质序列空间并有效地发挥“连接体”的作用，而远距离蛋白质之间的天然连接体是不可用的。审稿人本文由 Oliviero Carugo、Christine Orengo 和 Srikrishna Subramanian 审阅。

更新日期：2020-04-22

点击分享查看原文

点击收藏

公开下载

阅读更多本刊最新论文本刊介绍/投稿指南11