Chances and challenges of machine learning-based disease classification in genetic association studies illustrated on age-related macular degeneration.,Genetic Epidemiology

当前位置： X-MOL 学术 › Genet. Epidemiol. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Chances and challenges of machine learning-based disease classification in genetic association studies illustrated on age-related macular degeneration.
Genetic Epidemiology ( IF 1.7 ) Pub Date : 2020-08-02 , DOI: 10.1002/gepi.22336
Felix Guenther _{1,

2} , Caroline Brandl _{1,

3} , Thomas W Winkler ₁ , Veronika Wanner ₁ , Klaus Stark ₁ , Helmut Kuechenhoff ₂ , Iris M Heid ₁

Affiliation

Imaging technology and machine learning algorithms for disease classification set the stage for high‐throughput phenotyping and promising new avenues for genome‐wide association studies (GWAS). Despite emerging algorithms, there has been no successful application in GWAS so far. We establish machine learning‐based phenotyping in genetic association analysis as misclassification problem. To evaluate chances and challenges, we performed a GWAS based on automatically classified age‐related macular degeneration (AMD) in UK Biobank (images from 135,500 eyes; 68,400 persons). We quantified misclassification of automatically derived AMD in internal validation data (4,001 eyes; 2,013 persons) and developed a maximum likelihood approach (MLA) to account for it when estimating genetic association. We demonstrate that our MLA guards against bias and artifacts in simulation studies. By combining a GWAS on automatically derived AMD and our MLA in UK Biobank data, we were able to dissect true association (ARMS2/HTRA1, CFH) from artifacts (near HERC2) and identified eye color as associated with the misclassification. On this example, we provide a proof‐of‐concept that a GWAS using machine learning‐derived disease classification yields relevant results and that misclassification needs to be considered in analysis. These findings generalize to other phenotypes and emphasize the utility of genetic data for understanding misclassification structure of machine learning algorithms.

中文翻译：

在年龄相关性黄斑变性的遗传关联研究中基于机器学习的疾病分类的机遇和挑战。

用于疾病分类的成像技术和机器学习算法为高通量表型分析奠定了基础，并为全基因组关联研究（GWAS）开辟了有前景的新途径。尽管算法层出不穷，但迄今为止在 GWAS 中还没有成功的应用。我们在遗传关联分析中建立了基于机器学习的表型分析作为错误分类问题。为了评估机遇和挑战，我们根据英国生物银行自动分类的年龄相关性黄斑变性 (AMD) 进行了 GWAS（来自 135,500 只眼睛的图像；68,400 人）。我们对内部验证数据（4,001 只眼睛；2,013 人）中自动得出的 AMD 的错误分类进行了量化，并开发了最大似然法 (MLA)，以在估计遗传关联时对其进行解释。我们证明我们的 MLA 可以防止模拟研究中的偏差和伪影。通过将自动导出的 AMD 的 GWAS 与英国生物库数据中的 MLA 相结合，我们能够从伪影（靠近HERC2 ）中剖析出真实的关联（ ARMS2 / HTRA1 、 CFH ），并识别出与错误分类相关的眼睛颜色。在这个例子中，我们提供了一个概念验证，即使用机器学习衍生的疾病分类的 GWAS 可以产生相关结果，并且在分析中需要考虑错误分类。这些发现推广到其他表型，并强调遗传数据对于理解机器学习算法的错误分类结构的效用。

更新日期：2020-09-11

点击分享查看原文

点击收藏

公开下载

阅读更多本刊最新论文本刊介绍/投稿指南11