当前位置: X-MOL 学术Proteins Struct. Funct. Bioinform. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
A protein sequence fitness function for identifying natural and nonnatural proteins.
Proteins: Structure, Function, and Bioinformatics ( IF 3.2 ) Pub Date : 2020-05-16 , DOI: 10.1002/prot.25900
Rahul Kaushik 1 , Kam Y J Zhang 1
Affiliation  

The infinitesimally small sequence space naturally scouted in the millions of years of evolution suggests that the natural proteins are constrained by some functional prerequisites and should differ from randomly generated sequences. We have developed a protein sequence fitness scoring function that implements sequence and corresponding secondary structural information at tripeptide levels to differentiate natural and nonnatural proteins. The proposed fitness function is extensively validated on a dataset of about 210 000 natural and nonnatural protein sequences and benchmarked with existing methods for differentiating natural and nonnatural proteins. The high sensitivity, specificity, and percentage accuracy (0.81%, 0.95%, and 91% respectively) of the fitness function demonstrates its potential application for sampling the protein sequences with higher probability of mimicking natural proteins. Moreover, the four major classes of proteins (α proteins, β proteins, α/β proteins, and α + β proteins) are separately analyzed and β proteins are found to score slightly lower as compared to other classes. Further, an analysis of about 250 designed proteins (adopted from previously reported cases) helped to define the boundaries for sampling the ideal protein sequences. The protein sequence characterization aided by the proposed fitness function could facilitate the exploration of new perspectives in the design of novel functional proteins.

中文翻译:

鉴定天然和非天然蛋白质的蛋白质序列适应性功能。

在数百万年的进化过程中自然地发现的无限小的序列空间表明,天然蛋白质受到某些功能先决条件的限制,应该不同于随机生成的序列。我们已经开发了蛋白质序列适合度评分功能,该功能在三肽水平上实现序列和相应的二级结构信息,以区分天然和非天然蛋白质。在约21万个天然和非天然蛋白质序列的数据集上对提出的适应度函数进行了广泛验证,并使用现有的区分天然和非天然蛋白质的方法作为基准。高灵敏度,特异性和百分比准确性(0.81%,0.95%,(分别为91%和91%)的适应性功能证明了其潜在的应用潜力,可以以更高的模拟天然蛋白质的可能性对蛋白质序列进行采样。此外,分别分析了四大类蛋白质(α蛋白质,β蛋白质,α/β蛋白质和α+β蛋白质),并且发现β蛋白质的得分比其他类别低。此外,对约250种设计蛋白的分析(从先前报道的案例中采用)有助于确定理想蛋白序列采样的边界。提出的适应度函数辅助的蛋白质序列表征可以促进新型功能蛋白设计中新观点的探索。和α+β蛋白质)分别进行分析,发现β蛋白质得分较其他类别略低。此外,对约250种设计蛋白的分析(从先前报道的案例中采用)有助于确定理想蛋白序列采样的边界。提出的适应度函数辅助的蛋白质序列表征可以促进新型功能蛋白设计中新观点的探索。和α+β蛋白质)分别进行分析,发现β蛋白质得分较其他类别略低。此外,对约250种设计蛋白的分析(从先前报道的案例中采用)有助于确定理想蛋白序列采样的边界。提出的适应度函数辅助的蛋白质序列表征可以促进新型功能蛋白设计中新观点的探索。
更新日期:2020-05-16
down
wechat
bug