当前位置: X-MOL 学术BMC Med. Genomics › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
A network embedding model for pathogenic genes prediction by multi-path random walking on heterogeneous network.
BMC Medical Genomics ( IF 2.1 ) Pub Date : 2019-12-23 , DOI: 10.1186/s12920-019-0627-z
Bo Xu 1, 2 , Yu Liu 1 , Shuo Yu 1 , Lei Wang 3 , Jie Dong 3 , Hongfei Lin 3 , Zhihao Yang 3 , Jian Wang 3 , Feng Xia 1, 2
Affiliation  

BACKGROUND Prediction of pathogenic genes is crucial for disease prevention, diagnosis, and treatment. But traditional genetic localization methods are often technique-difficulty and time-consuming. With the development of computer science, computational biology has gradually become one of the main methods for finding candidate pathogenic genes. METHODS We propose a pathogenic genes prediction method based on network embedding which is called Multipath2vec. Firstly, we construct an heterogeneous network which is called GP-network. It is constructed based on three kinds of relationships between genes and phenotypes, including correlations between phenotypes, interactions between genes and known gene-phenotype pairs. Then in order to embedding the network better, we design the multi-path to guide random walk in GP-network. The multi-path includes multiple paths between genes and phenotypes which can capture complex structural information of heterogeneous network. Finally, we use the learned vector representation of each phenotype and protein to calculate the similarities and rank according to the similarities between candidate genes and the target phenotype. RESULTS We implemented Multipath2vec and four baseline approaches (i.e., CATAPULT, PRINCE, Deepwalk and Metapath2vec) on many-genes gene-phenotype data, single-gene gene-phenotype data and whole gene-phenotype data. Experimental results show that Multipath2vec outperformed the state-of-the-art baselines in pathogenic genes prediction task. CONCLUSIONS We propose Multipath2vec that can be utilized to predict pathogenic genes and experimental results show the higher accuracy of pathogenic genes prediction.

中文翻译:

异构网络上多路径随机行走预测致病基因的网络嵌入模型。

背景技术致病基因的预测对于疾病的预防、诊断和治疗至关重要。但传统的基因定位方法往往技术难度大且耗时。随着计算机科学的发展,计算生物学逐渐成为寻找候选致病基因的主要方法之一。方法我们提出了一种基于网络嵌入的致病基因预测方法,称为Multipath2vec。首先,我们构建一个异构网络,称为GP网络。它是基于基因和表型之间的三种关系构建的,包括表型之间的相关性、基因和已知基因-表型对之间的相互作用。然后为了更好地嵌入网络,我们设计了多路径来引导GP网络中的随机游走。多路径包括基因和表型之间的多条路径,可以捕获异构网络的复杂结构信息。最后,我们使用每个表型和蛋白质的学习向量表示来根据候选基因和目标表型之间的相似性来计算相似性和排名。结果我们对多基因基因表型数据、单基因基因表型数据和全基因表型数据实施了Multipath2vec 和四种基线方法(即CATAPULT、PRINCE、Deepwalk 和Metapath2vec)。实验结果表明,Multipath2vec 在致病基因预测任务中优于最先进的基线。结论我们提出Multipath2vec可用于预测致病基因,实验结果表明致病基因预测具有更高的准确性。
更新日期:2019-12-23
down
wechat
bug