当前位置: X-MOL 学术Am. J. Hum. Genet. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Identifying digenic disease genes via machine learning in the Undiagnosed Diseases Network
American Journal of Human Genetics ( IF 8.1 ) Pub Date : 2021-09-15 , DOI: 10.1016/j.ajhg.2021.08.010
Souhrid Mukherjee 1 , Joy D. Cogan 2 , John H. Newman 3 , John A. Phillips 2 , Rizwan Hamid 2 , Jens Meiler 4, 5, 6, 7, 8, 9, 10 , John A. Capra 1, 6, 7, 11, 12
Affiliation  

Rare diseases affect millions of people worldwide, and discovering their genetic causes is challenging. More than half of the individuals analyzed by the Undiagnosed Diseases Network (UDN) remain undiagnosed. The central hypothesis of this work is that many of these rare genetic disorders are caused by multiple variants in more than one gene. However, given the large number of variants in each individual genome, experimentally evaluating combinations of variants for potential to cause disease is currently infeasible. To address this challenge, we developed the digenic predictor (DiGePred), a random forest classifier for identifying candidate digenic disease gene pairs by features derived from biological networks, genomics, evolutionary history, and functional annotations. We trained the DiGePred classifier by using DIDA, the largest available database of known digenic-disease-causing gene pairs, and several sets of non-digenic gene pairs, including variant pairs derived from unaffected relatives of UDN individuals. DiGePred achieved high precision and recall in cross-validation and on a held-out test set (PR area under the curve > 77%), and we further demonstrate its utility by using digenic pairs from the recent literature. In contrast to other approaches, DiGePred also appropriately controls the number of false positives when applied in realistic clinical settings. Finally, to enable the rapid screening of variant gene pairs for digenic disease potential, we freely provide the predictions of DiGePred on all human gene pairs. Our work enables the discovery of genetic causes for rare non-monogenic diseases by providing a means to rapidly evaluate variant gene pairs for the potential to cause digenic disease.



中文翻译:

在未确诊疾病网络中通过机器学习识别双基因疾病基因

罕见病影响着全世界数百万人,而发现其遗传原因具有挑战性。未确诊疾病网络 (UDN) 分析的一半以上的人仍未确诊。这项工作的中心假设是,许多这些罕见的遗传疾病是由多个基因的多个变异引起的。然而,考虑到每个个体基因组中的大量变异,通过实验评估变异组合的潜在致病性目前是不可行的。为了应对这一挑战,我们开发了双基因预测器 (DiGePred),这是一种随机森林分类器,用于通过来自生物网络、基因组学、进化历史和功能注释的特征识别候选双基因疾病基因对。我们使用 DIDA 训练了 DiGePred 分类器,已知双基因致病基因对的最大可用数据库,以及几组非双基因基因对,包括来自未受影响的 UDN 个体亲属的变异对。DiGePred 在交叉验证和保留测试集(曲线下的 PR 面积 > 77%)中实现了高精度和召回率,我们通过使用最近文献中的双基因对进一步证明了它的实用性。与其他方法相比,DiGePred 在实际临床环境中应用时还可以适当地控制假阳性的数量。最后,为了能够快速筛选具有双基因疾病潜力的变异基因对,我们免费提供 DiGePred 对所有人类基因对的预测。

更新日期:2021-10-09
down
wechat
bug