当前位置: X-MOL 学术J. Chem. Inf. Model. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
DenseCPD: Improving the Accuracy of Neural-Network-Based Computational Protein Sequence Design with DenseNet.
Journal of Chemical Information and Modeling ( IF 5.6 ) Pub Date : 2020-03-03 , DOI: 10.1021/acs.jcim.0c00043
Yifei Qi 1, 2 , John Z H Zhang 1, 2, 3
Affiliation  

Computational protein design remains a challenging task despite its remarkable success in the past few decades. With the rapid progress of deep-learning techniques and the accumulation of three-dimensional protein structures, using deep neural networks to learn the relationship between protein sequences and structures and then automatically design a protein sequence for a given protein backbone structure is becoming increasingly feasible. In this study, we developed a deep neural network named DenseCPD that considers the three-dimensional density distribution of protein backbone atoms and predicts the probability of 20 natural amino acids for each residue in a protein. The accuracy of DenseCPD was 53.240.17% in a 5-fold cross validation on the training set and 55.53% and 50.71% on two independent test sets, which is more than 10% higher than those of previous state-of-the-art methods. Two approaches for using DenseCPD predictions in computational protein design were analyzed. The approach using the cutoff of accumulative probability had a smaller sequence search space compared to that of the approach that simply uses the top-k predictions and therefore enables higher sequence identity in redesigning three proteins with Rosetta. The network and the data sets are available on a web server at http://protein.org.cn/densecpd.html. The results of this study may benefit the further development of computational protein design methods.

中文翻译:

DenseCPD:使用DenseNet提高基于神经网络的计算蛋白质序列设计的准确性。

尽管在过去的几十年中取得了巨大的成功,但计算蛋白质的设计仍然是一项艰巨的任务。随着深度学习技术的飞速发展和三维蛋白质结构的积累,使用深度神经网络来学习蛋白质序列与结构之间的关系,然后针对给定的蛋白质骨架结构自动设计蛋白质序列变得越来越可行。在这项研究中,我们开发了一个名为DenseCPD的深度神经网络,该网络考虑了蛋白质骨架原子的三维密度分布,并预测了蛋白质中每个残基的20个天然氨基酸的可能性。在训练集上进行5次交叉验证后,DenseCPD的准确性为53.240.17%,在两个独立的测试集上为55.53%和50.71%,比以前的最新方法高出10%以上。分析了在计算蛋白设计中使用DenseCPD预测的两种方法。与仅使用top-k预测的方法相比,使用累积概率截止值的方法具有较小的序列搜索空间,因此在使用Rosetta重新设计三种蛋白质时可以实现更高的序列同一性。网络和数据集可在Web服务器上找到,网址为http://protein.org.cn/densecpd.html。这项研究的结果可能有益于计算蛋白设计方法的进一步发展。与仅使用top-k预测的方法相比,使用累积概率截止值的方法具有较小的序列搜索空间,因此在使用Rosetta重新设计三种蛋白质时可以实现更高的序列同一性。网络和数据集可在Web服务器上找到,网址为http://protein.org.cn/densecpd.html。这项研究的结果可能有益于计算蛋白设计方法的进一步发展。与仅使用top-k预测的方法相比,使用累积概率截止值的方法具有较小的序列搜索空间,因此在使用Rosetta重新设计三种蛋白质时可以实现更高的序列同一性。网络和数据集可在Web服务器上找到,网址为http://protein.org.cn/densecpd.html。这项研究的结果可能有利于计算蛋白设计方法的进一步发展。
更新日期:2020-04-24
down
wechat
bug