当前位置: X-MOL 学术Genet. Epidemiol. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Ordered multinomial regression for genetic association analysis of ordinal phenotypes at Biobank scale.
Genetic Epidemiology ( IF 2.1 ) Pub Date : 2019-12-26 , DOI: 10.1002/gepi.22276
Christopher A German 1 , Janet S Sinsheimer 1, 2, 3 , Yann C Klimentidis 4 , Hua Zhou 1 , Jin J Zhou 4
Affiliation  

Logistic regression is the primary analysis tool for binary traits in genome‐wide association studies (GWAS). Multinomial regression extends logistic regression to multiple categories. However, many phenotypes more naturally take ordered, discrete values. Examples include (a) subtypes defined from multiple sources of clinical information and (b) derived phenotypes generated by specific phenotyping algorithms for electronic health records (EHR). GWAS of ordinal traits have been problematic. Dichotomizing can lead to a range of arbitrary cutoff values, generating inconsistent, hard to interpret results. Using multinomial regression ignores trait value hierarchy and potentially loses power. Treating ordinal data as quantitative can lead to misleading inference. To address these issues, we analyze ordinal traits with an ordered, multinomial model. This approach increases power and leads to more interpretable results. We derive efficient algorithms for computing test statistics, making ordinal trait GWAS computationally practical for Biobank scale data. Our method is available as a Julia package OrdinalGWAS.jl. Application to a COPDGene study confirms previously found signals based on binary case–control status, but with more significance. Additionally, we demonstrate the capability of our package to run on UK Biobank data by analyzing hypertension as an ordinal trait.

中文翻译:

在Biobank规模的有序表型遗传关联分析中的有序多项式回归。

Logistic回归是全基因组关联研究(GWAS)中二元性状的主要分析工具。多项式回归将逻辑回归扩展到多个类别。但是,许多表型更自然地采用有序的离散值。实例包括(a)从多种临床信息来源定义的亚型,以及(b)通过针对电子健康记录(EHR)的特定表型算法生成的派生表型。GWAS的序数特征一直存在问题。二分法可能导致一定范围的任意截止值,从而产生不一致且难以解释的结果。使用多项式回归会忽略特征值层次结构,并可能会失去功效。将序数数据视为定量数据会导致误导性推断。为了解决这些问题,我们使用有序的多项式模型分析序数特征。这种方法增加了功能并导致了更多可解释的结果。我们推导了用于计算测试统计数据的有效算法,使序性状GWAS在生物库规模数据上的计算实用。我们的方法可以作为Julia包使用OrdinalGWAS.jl。在COPDGene研究中的应用基于二值病例对照状态证实了先前发现的信号,但具有更大的意义。此外,我们通过分析高血压作为序贯性状来证明我们的软件包能够在UK Biobank数据上运行的能力。
更新日期:2019-12-26
down
wechat
bug