当前位置: X-MOL 学术bioRxiv. Bioinform. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Protein structure search to support the development of protein structure prediction methods
bioRxiv - Bioinformatics Pub Date : 2020-06-04 , DOI: 10.1101/2020.06.03.131821
Ronald Ayoub , Yugyung Lee

Protein structure prediction is a long-standing unsolved problem in molecular biology that has seen renewed interest with the recent success of deep learning with AlphaFold at CASP13. While developing and evaluating protein structure prediction methods, researchers may want to identify the most similar known structures to their predicted structures. These predicted structures often have low sequence and structure similarity to known structures. We show how RUPEE, a purely geometric protein structure search, is able to identify the structures most similar to structure predictions, regardless of how they vary from known structures, something existing protein structure searches struggle with. RUPEE accomplishes this through the use of a novel linear encoding of protein structures as a sequence of residue descriptors. Using a fast Needleman-Wunsch algorithm, RUPEE is able to perform alignments on the sequences of residue descriptors for every available structure. This is followed by a series of increasingly accurate structure alignments from TM-align alignments initialized with the Needleman-Wunsch residue descriptor alignments to standard TM-align alignments of the final results. By using alignment normalization effectively at each stage, RUPEE also can execute containment searches in addition to full-length searches to identify structural motifs within proteins. We compare the results of RUPEE to mTM-align, SSM, CATHEDRAL and VAST using a benchmark derived from the protein structure predictions submitted to CASP13. RUPEE identifies better alignments on average with respect to RMSD and TM-score as well as Q-score and SSAP-score, scores specific to SSM and CATHEDRAL, respectively. Finally, we show a sample of the top-scoring alignments that RUPEE identified that none of the other protein structure searches we compared to were able to identify.

中文翻译:

蛋白质结构搜索以支持蛋白质结构预测方法的发展

蛋白质结构预测是分子生物学中一个长期未解决的问题,随着最近在CASP13上使用AlphaFold进行深度学习的成功,人们重新引起了人们的兴趣。在开发和评估蛋白质结构预测方法时,研究人员可能希望识别与其预测结构最相似的已知结构。这些预测的结构通常与已知结构具有较低的序列和结构相似性。我们将展示纯粹的几何蛋白质结构搜索RUPEE如何能够识别与结构预测最相似的结构,而不管它们与已知结构有何不同,而现有的蛋白质结构搜索却在与之抗衡。RUPEE通过使用蛋白质结构的新型线性编码作为残基描述符序列来实现这一目标。使用快速的Needleman-Wunsch算法,RUPEE能够对每种可用结构的残基描述符序列进行比对。随后是一系列越来越精确的结构比对,从用Needleman-Wunsch残基描述符比对初始化的TM-align比对到最终结果的标准TM-align比对。通过在每个阶段有效地使用比对归一化,RUPEE除了可以进行全长搜索以识别蛋白质内的结构基序外,还可以执行遏制搜索。我们使用从提交给CASP13的蛋白质结构预测得出的基准比较RUPEE与mTM-align,SSM,CATHEDRAL和VAST的结果。相对于RMSD和TM得分,以及Q得分和SSAP得分,RUPEE平均可以确定更好的一致性,分别针对SSM和CATHEDRAL的分数。最后,我们显示了RUPEE识别的最高得分比对的样本,我们比较过的其他蛋白质结构搜索均无法识别。
更新日期:2020-06-04
down
wechat
bug