当前位置: X-MOL 学术Proteins Struct. Funct. Bioinform. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Artificial protein sequences enable recognition of vicinal and distant protein functional relationships.
Proteins: Structure, Function, and Bioinformatics ( IF 2.9 ) Pub Date : 2020-07-29 , DOI: 10.1002/prot.25986
Gayatri Kumar 1 , Narayanaswamy Srinivasan 1 , Sankaran Sandhya 1
Affiliation  

High divergence in protein sequences makes the detection of distant protein relationships through homology‐based approaches challenging. Grouping protein sequences into families, through similarities in either sequence or 3‐D structure, facilitates in the improved recognition of protein relationships. In addition, strategically designed protein‐like sequences have been shown to bridge distant structural domain families by serving as artificial linkers. In this study, we have augmented a search database of known protein domain families with such designed sequences, with the intention of providing functional clues to domain families of unknown structure. When assessed using representative query sequences from each family, we obtain a success rate of 94% in protein domain families of known structure. Further, we demonstrate that the augmented search space enabled fold recognition for 582 families with no structural information available a priori. Additionally, we were able to provide reliable functional relationships for 610 orphan families. We discuss the application of our method in predicting functional roles through select examples for DUF4922, DUF5131, and DUF5085. Our approach also detects new associations between families that were previously not known to be related, as demonstrated through new sub‐groups of the RNA polymerase domain among three distinct RNA viruses. Taken together, designed sequences‐augmented search databases direct the detection of meaningful relationships between distant protein families. In turn, they enable fold recognition and offer reliable pointers to potential functional sites that may be probed further through direct mutagenesis studies.

中文翻译:

人工蛋白质序列能够识别邻近和远处的蛋白质功能关系。

蛋白质序列的高度差异使得通过基于同源性的方法检测远距离蛋白质关系具有挑战性。通过序列或3D结构的相似性将蛋白质序列分为多个家族,有助于改善对蛋白质关系的识别。此外,策略性设计的蛋白样序列已证明可通过充当人工接头来桥接遥远的结构域家族。在这项研究中,我们用这种设计的序列扩充了已知蛋白质结构域家族的搜索数据库,目的是为未知结构的结构域家族提供功能线索。当使用每个家族的代表性查询序列进行评估时,我们在已知结构的蛋白质结构域家族中获得94%的成功率。进一步,此外,我们能够为610个孤儿家庭提供可靠的功能关系。我们通过选择DUF4922,DUF5131和DUF5085的示例讨论我们的方法在预测功能角色中的应用。我们的方法还可以检测以前未知的家族之间的新关联,如三种不同RNA病毒中RNA聚合酶结构域的新亚组所证明的。综上所述,设计的序列增强搜索数据库可指导检测远距离蛋白质家族之间有意义的关系。反过来,它们使折叠识别成为可能,并提供了指向潜在功能位点的可靠指针,这些位点可以通过直接诱变研究进一步探查。
更新日期:2020-07-29
down
wechat
bug