当前位置: X-MOL 学术Eur. J. Hum. Genet. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Prioritization of disease genes from GWAS using ensemble-based positive-unlabeled learning
European Journal of Human Genetics ( IF 3.7 ) Pub Date : 2021-07-19 , DOI: 10.1038/s41431-021-00930-w
Nikita Kolosov 1, 2, 3 , Mark J Daly 3, 4, 5 , Mykyta Artomov 1, 2, 3, 4, 5
Affiliation  

A primary challenge in understanding disease biology from genome-wide association studies (GWAS) arises from the inability to directly implicate causal genes from association data. Integration of multiple-omics data sources potentially provides important functional links between associated variants and candidate genes. Machine-learning is well-positioned to take advantage of a variety of such data and provide a solution for the prioritization of disease genes. Yet, classical positive-negative classifiers impose strong limitations on the gene prioritization procedure, such as a lack of reliable non-causal genes for training. Here, we developed a novel gene prioritization tool—Gene Prioritizer (GPrior). It is an ensemble of five positive-unlabeled bagging classifiers (Logistic Regression, Support Vector Machine, Random Forest, Decision Tree, Adaptive Boosting), that treats all genes of unknown relevance as an unlabeled set. GPrior selects an optimal composition of algorithms to tune the model for each specific phenotype. Altogether, GPrior fills an important niche of methods for GWAS data post-processing, significantly improving the ability to pinpoint disease genes compared to existing solutions.



中文翻译:


使用基于集成的正未标记学习对 GWAS 中的疾病基因进行优先排序



通过全基因组关联研究(GWAS)了解疾病生物学的主要挑战来自于无法直接从关联数据中暗示因果基因。多组学数据源的整合可能提供相关变异和候选基因之间的重要功能联系。机器学习能够很好地利用各种此类数据,并为疾病基因的优先排序提供解决方案。然而,经典的正负分类器对基因优先顺序施加了很大的限制,例如缺乏可靠的非因果基因进行训练。在这里,我们开发了一种新型基因优先排序工具——Gene Prioritizer (GPrior)。它是五个正未标记装袋分类器(逻辑回归、支持向量机、随机森林、决策树、自适应提升)的集合,将所有未知相关性的基因视为未标记集。 GPrior 选择最佳的算法组合来针对每种特定表型调整模型。总而言之,GPrior 填补了 GWAS 数据后处理方法的重要空白,与现有解决方案相比,显着提高了查明疾病基因的能力。

更新日期:2021-07-19
down
wechat
bug