当前位置: X-MOL 学术J. Mol. Biol. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
ProNA2020 predicts protein-DNA, protein-RNA, and protein-protein binding proteins and residues from sequence.
Journal of Molecular Biology ( IF 4.7 ) Pub Date : 2020-03-03 , DOI: 10.1016/j.jmb.2020.02.026
Jiajun Qiu 1 , Michael Bernhofer 1 , Michael Heinzinger 1 , Sofie Kemper 2 , Tomas Norambuena 3 , Francisco Melo 4 , Burkhard Rost 5
Affiliation  

The intricate details of how proteins bind to proteins, DNA, and RNA are crucial for the understanding of almost all biological processes. Disease-causing sequence variants often affect binding residues. Here, we described a new, comprehensive system of in silico methods that take only protein sequence as input to predict binding of protein to DNA, RNA, and other proteins. Firstly, we needed to develop several new methods to predict whether or not proteins bind (per-protein prediction). Secondly, we developed independent methods that predict which residues bind (per-residue). Not requiring three-dimensional information, the system can predict the actual binding residue. The system combined homology-based inference with machine learning and motif-based profile-kernel approaches with word-based (ProtVec) solutions to machine learning protein level predictions. This achieved an overall non-exclusive three-state accuracy of 77% ± 1% (±one standard error) corresponding to a 1.8 fold improvement over random (best classification for protein-protein with F1 = 91 ± 0.8%). Standard neural networks for per-residue binding residue predictions appeared best for DNA-binding (Q2 = 81 ± 0.9%) followed by RNA-binding (Q2 = 80 ± 1%) and worst for protein-protein binding (Q2 = 69 ± 0.8%). The new method, dubbed ProNA2020, is available as code through github (https://github.com/Rostlab/ProNA2020.git) and through PredictProtein (www.predictprotein.org).

中文翻译:

ProNA2020可预测蛋白质-DNA,蛋白质-RNA和蛋白质-蛋白质结合蛋白以及序列中的残基。

蛋白质如何与蛋白质,DNA和RNA结合的复杂细节对于理解几乎所有生物过程都至关重要。致病序列变异通常会影响结合残基。在这里,我们描述了一种新的,完整的计算机系统方法系统,该系统仅采用蛋白质序列作为输入来预测蛋白质与DNA,RNA和其他蛋白质的结合。首先,我们需要开发几种新方法来预测蛋白质是否结合(每蛋白质预测)。其次,我们开发了独立的方法来预测哪些残基结合(每个残基)。该系统不需要三维信息,可以预测实际的结合残基。该系统将基于同源性的推理与机器学习,基于基序的配置文件内核方法与基于单词的(ProtVec)解决方案相结合,用于机器学习蛋白质水平的预测。这实现了77%±1%的整体非排他性三态精度(±1标准误差),与随机值(F1 = 91±0.8%的蛋白质-蛋白质的最佳分类)相比提高了1.8倍。用于每个残基结合残基预测的标准神经网络似乎最适合DNA结合(Q2 = 81±0.9%),其次是RNA结合(Q2 = 80±1%),最不适合蛋白质-蛋白质结合(Q2 = 69±0.8) %)。名为ProNA2020的新方法可通过github(https://github.com/Rostlab/ProNA2020.git)和PredictProtein(www.predictprotein.org)作为代码获得。这实现了77%±1%的整体非排他性三态精度(±1标准误差),与随机值(F1 = 91±0.8%的蛋白质-蛋白质的最佳分类)相比提高了1.8倍。用于每个残基结合残基预测的标准神经网络似乎最适合DNA结合(Q2 = 81±0.9%),其次是RNA结合(Q2 = 80±1%),最不适合蛋白质-蛋白质结合(Q2 = 69±0.8) %)。名为ProNA2020的新方法可通过github(https://github.com/Rostlab/ProNA2020.git)和PredictProtein(www.predictprotein.org)作为代码获得。这实现了77%±1%的整体非排他性三态精度(±1标准误差),与随机值(F1 = 91±0.8%的蛋白质-蛋白质的最佳分类)相比提高了1.8倍。用于每个残基结合残基预测的标准神经网络似乎最适合DNA结合(Q2 = 81±0.9%),其次是RNA结合(Q2 = 80±1%),最不适合蛋白质-蛋白质结合(Q2 = 69±0.8) %)。名为ProNA2020的新方法可通过github(https://github.com/Rostlab/ProNA2020.git)和PredictProtein(www.predictprotein.org)作为代码获得。9%),其次是RNA结合(Q2 = 80±1%),最差的是蛋白质-蛋白质结合(Q2 = 69±0.8%)。名为ProNA2020的新方法可通过github(https://github.com/Rostlab/ProNA2020.git)和PredictProtein(www.predictprotein.org)作为代码获得。9%),其次是RNA结合(Q2 = 80±1%),最差的是蛋白质-蛋白质结合(Q2 = 69±0.8%)。名为ProNA2020的新方法可通过github(https://github.com/Rostlab/ProNA2020.git)和PredictProtein(www.predictprotein.org)作为代码获得。
更新日期:2020-03-04
down
wechat
bug