当前位置: X-MOL 学术Brief. Bioinform. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
FoldRec-C2C: protein fold recognition by combining cluster-to-cluster model and protein similarity network.
Briefings in Bioinformatics ( IF 9.5 ) Pub Date : 2020-07-20 , DOI: 10.1093/bib/bbaa144
Jiangyi Shao 1 , Ke Yan 2 , Bin Liu 1
Affiliation  

As a key for studying the protein structures, protein fold recognition is playing an important role in predicting the protein structures associated with COVID-19 and other important structures. However, the existing computational predictors only focus on the protein pairwise similarity or the similarity between two groups of proteins from 2-folds. However, the homology relationship among proteins is in a hierarchical structure. The global protein similarity network will contribute to the performance improvement. In this study, we proposed a predictor called FoldRec-C2C to globally incorporate the interactions among proteins into the prediction. For the FoldRec-C2C predictor, protein fold recognition problem is treated as an information retrieval task in nature language processing. The initial ranking results were generated by a surprised ranking algorithm Learning to Rank, and then three re-ranking algorithms were performed on the ranking lists to adjust the results globally based on the protein similarity network, including seq-to-seq model, seq-to-cluster model and cluster-to-cluster model (C2C). When tested on a widely used and rigorous benchmark dataset LINDAHL dataset, FoldRec-C2C outperforms other 34 state-of-the-art methods in this field. The source code and data of FoldRec-C2C can be downloaded from http://bliulab.net/FoldRec-C2C/download.

中文翻译:

FoldRec-C2C:通过结合簇到簇模型和蛋白质相似性网络来进行蛋白质折叠识别。

作为研究蛋白质结构的关键,蛋白质折叠识别在预测与COVID-19和其他重要结构相关的蛋白质结构中起着重要作用。但是,现有的计算预测变量仅关注蛋白质成对相似性或两组蛋白质之间2倍的相似性。但是,蛋白质之间的同源性关系是层次结构。全球蛋白质相似性网络将有助于提高性能。在这项研究中,我们提出了一种称为FoldRec-C2C的预测因子,以将蛋白质之间的相互作用整体纳入预测。对于FoldRec-C2C预测因子,蛋白质折叠识别问题被视为自然语言处理中的信息检索任务。初始排名结果是由“学习排名”的惊奇排名算法生成的,然后对排名列表执行了三种重新排名算法,以基于蛋白质相似性网络对全局结果进行调整,包括seq-to-seq模型,seq-群集到群集模型和群集到群集模型(C2C)。当在广泛使用且严格的基准数据集LINDAHL数据集上进行测试时,FoldRec-C2C优于该领域中的其他34种最新方法。可以从http://bliulab.net/FoldRec-C2C/download下载FoldRec-C2C的源代码和数据。当在广泛使用且严格的基准数据集LINDAHL数据集上进行测试时,FoldRec-C2C优于该领域中的其他34种最新方法。可以从http://bliulab.net/FoldRec-C2C/download下载FoldRec-C2C的源代码和数据。在广泛使用且严格的基准数据集LINDAHL数据集上进行测试时,FoldRec-C2C优于该领域的其他34种最新方法。可以从http://bliulab.net/FoldRec-C2C/download下载FoldRec-C2C的源代码和数据。
更新日期:2020-07-20
down
wechat
bug