当前位置: X-MOL 学术BMC Bioinform. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Ensemble disease gene prediction by clinical sample-based networks.
BMC Bioinformatics ( IF 3 ) Pub Date : 2020-03-11 , DOI: 10.1186/s12859-020-3346-8
Ping Luo 1 , Li-Ping Tian 2 , Bolin Chen 3 , Qianghua Xiao 4 , Fang-Xiang Wu 1, 5, 6, 7
Affiliation  

Disease gene prediction is a critical and challenging task. Many computational methods have been developed to predict disease genes, which can reduce the money and time used in the experimental validation. Since proteins (products of genes) usually work together to achieve a specific function, biomolecular networks, such as the protein-protein interaction (PPI) network and gene co-expression networks, are widely used to predict disease genes by analyzing the relationships between known disease genes and other genes in the networks. However, existing methods commonly use a universal static PPI network, which ignore the fact that PPIs are dynamic, and PPIs in various patients should also be different. To address these issues, we develop an ensemble algorithm to predict disease genes from clinical sample-based networks (EdgCSN). The algorithm first constructs single sample-based networks for each case sample of the disease under study. Then, these single sample-based networks are merged to several fused networks based on the clustering results of the samples. After that, logistic models are trained with centrality features extracted from the fused networks, and an ensemble strategy is used to predict the finial probability of each gene being disease-associated. EdgCSN is evaluated on breast cancer (BC), thyroid cancer (TC) and Alzheimer’s disease (AD) and obtains AUC values of 0.970, 0.971 and 0.966, respectively, which are much better than the competing algorithms. Subsequent de novo validations also demonstrate the ability of EdgCSN in predicting new disease genes. In this study, we propose EdgCSN, which is an ensemble learning algorithm for predicting disease genes with models trained by centrality features extracted from clinical sample-based networks. Results of the leave-one-out cross validation show that our EdgCSN performs much better than the competing algorithms in predicting BC-associated, TC-associated and AD-associated genes. de novo validations also show that EdgCSN is valuable for identifying new disease genes.

中文翻译:

通过基于临床样本的网络整合疾病基因预测。

疾病基因预测是一项至关重要且具有挑战性的任务。已经开发出许多计算方法来预测疾病基因,这可以减少用于实验验证的金钱和时间。由于蛋白质(基因的产物)通常协同工作以实现特定功能,因此生物分子网络(例如蛋白质-蛋白质相互作用(PPI)网络和基因共表达网络)被广泛用于通过分析已知基因之间的关系来预测疾病基因。疾病基因和网络中的其他基因。但是,现有方法通常使用通用的静态PPI网络,该网络忽略了PPI是动态的这一事实,并且各个患者中的PPI也应该有所不同。为了解决这些问题,我们开发了一种集成算法,可以从基于临床样本的网络(EdgCSN)预测疾病基因。该算法首先为所研究疾病的每个病例样本构建基于单个样本的网络。然后,基于样本的聚类结果,将这些基于单个样本的网络合并为多个融合网络。之后,使用从融合网络中提取的中心特征来训练逻辑模型,然后使用集成策略来预测每个基因与疾病相关的最终概率。对乳腺癌(BC),甲状腺癌(TC)和阿尔茨海默氏病(AD)进行EdgCSN评估,得出的AUC值分别为0.970、0.971和0.966,这比竞争算法要好得多。随后的从头验证也证明了EdgCSN预测新疾病基因的能力。在这项研究中,我们提出了EdgCSN,这是一种集成学习算法,可使用通过从基于临床样本的网络中提取的中心特征进行训练的模型来预测疾病基因。留一法交叉验证的结果表明,我们的EdgCSN在预测BC相关,TC相关和AD相关基因方面比竞争算法要好得多。从头验证也表明,EdgCSN对于鉴定新的疾病基因非常有价值。
更新日期:2020-03-16
down
wechat
bug