当前位置: X-MOL 学术Front. Genet. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Comparative Analysis of Unsupervised Protein Similarity Prediction Based on Graph Embedding
Frontiers in Genetics ( IF 2.8 ) Pub Date : 2021-09-22 , DOI: 10.3389/fgene.2021.744334
Yuanyuan Zhang 1, 2 , Ziqi Wang 1 , Shudong Wang 2 , Junliang Shang 3
Affiliation  

The study of protein–protein interaction and the determination of protein functions are important parts of proteomics. Computational methods are used to study the similarity between proteins based on Gene Ontology (GO) to explore their functions and possible interactions. GO is a series of standardized terms that describe gene products from molecular functions, biological processes, and cell components. Previous studies on assessing the similarity of GO terms were primarily based on Information Content (IC) between GO terms to measure the similarity of proteins. However, these methods tend to ignore the structural information between GO terms. Therefore, considering the structural information of GO terms, we systematically analyze the performance of the GO graph and GO Annotation (GOA) graph in calculating the similarity of proteins using different graph embedding methods. When applied to the actual Human and Yeast datasets, the feature vectors of GO terms and proteins are learned based on different graph embedding methods. To measure the similarity of the proteins annotated by different GO numbers, we used Dynamic Time Warping (DTW) and cosine to calculate protein similarity in GO graph and GOA graph, respectively. Link prediction experiments were then performed to evaluate the reliability of protein similarity networks constructed by different methods. It is shown that graph embedding methods have obvious advantages over the traditional IC-based methods. We found that random walk graph embedding methods, in particular, showed excellent performance in calculating the similarity of proteins. By comparing link prediction experiment results from GO(DTW) and GOA(cosine) methods, it is shown that GO(DTW) features provide highly effective information for analyzing the similarity among proteins.



中文翻译:

基于图嵌入的无监督蛋白质相似度预测比较分析

蛋白质-蛋白质相互作用的研究和蛋白质功能的测定是蛋白质组学的重要组成部分。基于基因本体论(GO),利用计算方法研究蛋白质之间的相似性,以探索它们的功能和可能的相互作用。GO 是一系列标准化术语,描述分子功能、生物过程和细胞成分的基因产物。以往评估GO term相似性的研究主要是根据GO term之间的信息内容(IC)来衡量蛋白质的相似性。然而,这些方法往往忽略了 GO 术语之间的结构信息。因此,考虑到GO术语的结构信息,我们系统地分析了GO图和GO注释(GOA)图在使用不同图嵌入方法计算蛋白质相似度时的性能。当应用于实际的人类和酵母数据集时,GO术语和蛋白质的特征向量是基于不同的图嵌入方法来学习的。为了测量不同GO编号注释的蛋白质的相似性,我们使用动态时间规整(DTW)和余弦分别计算GO图和GOA图中的蛋白质相似性。然后进行链接预测实验来评估不同方法构建的蛋白质相似性网络的可靠性。结果表明,图嵌入方法比传统的基于 IC 的方法具有明显的优势。我们发现随机游走图嵌入方法在计算蛋白质相似性方面表现出优异的性能。通过比较GO(DTW)和GOA(cosine)方法的链接预测实验结果,表明GO(DTW)特征为分析蛋白质之间的相似性提供了高效的信息。

更新日期:2021-09-22
down
wechat
bug