当前位置: X-MOL 学术Am. J. Hum. Genet. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
A Fast and Accurate Method for Genome-Wide Time-to-Event Data Analysis and Its Application to UK Biobank.
American Journal of Human Genetics ( IF 9.8 ) Pub Date : 2020-06-25 , DOI: 10.1016/j.ajhg.2020.06.003
Wenjian Bi 1 , Lars G Fritsche 1 , Bhramar Mukherjee 2 , Sehee Kim 3 , Seunggeun Lee 4
Affiliation  

With increasing biobanking efforts connecting electronic health records and national registries to germline genetics, the time-to-event data analysis has attracted increasing attention in the genetics studies of human diseases. In time-to-event data analysis, the Cox proportional hazards (PH) regression model is one of the most used approaches. However, existing methods and tools are not scalable when analyzing a large biobank with hundreds of thousands of samples and endpoints, and they are not accurate when testing low-frequency and rare variants. Here, we propose a scalable and accurate method, SPACox (a saddlepoint approximation implementation based on the Cox PH regression model), that is applicable for genome-wide scale time-to-event data analysis. SPACox requires fitting a Cox PH regression model only once across the genome-wide analysis and then uses a saddlepoint approximation (SPA) to calibrate the test statistics. Simulation studies show that SPACox is 76–252 times faster than other existing alternatives, such as gwasurvivr, 185–511 times faster than the standard Wald test, and more than 6,000 times faster than the Firth correction and can control type I error rates at the genome-wide significance level regardless of minor allele frequencies. Through the analysis of UK Biobank inpatient data of 282,871 white British European ancestry samples, we show that SPACox can efficiently analyze large sample sizes and accurately control type I error rates. We identified 611 loci associated with time-to-event phenotypes of 12 common diseases, of which 38 loci would be missed within a logistic regression framework with a binary phenotype defined as event occurrence status during the follow-up period.



中文翻译:

全基因组事件时间数据分析的快速,准确方法及其在英国生物库中的应用。

随着将电子健康记录和国家注册簿与种系遗传学联系起来的生物银行工作的日益增多,事件数据的时间分析在人类疾病的遗传学研究中引起了越来越多的关注。在事件数据分析中,Cox比例风险(PH)回归模型是最常用的方法之一。但是,现有的方法和工具在分析具有成千上万个样本和终点的大型生物库时无法扩展,并且在测试低频和稀有变体时并不准确。在这里,我们提出了一种可扩展且准确的方法SPACox(基于Cox PH回归模型的鞍点近似实现),适用于全基因组范围的事件时间数据分析。SPACox要求在整个基因组分析中仅拟合一次Cox PH回归模型,然后使用鞍点近似(SPA)来校准测试统计数据。仿真研究表明,SPACox比其他现有替代方案(例如gwasurvivr)快76-252倍,比标准Wald测试快185-511倍,比Firth校正快6,000倍以上,并且可以控制I型错误率。不论次要的等位基因频率如何,全基因组范围的显着性水平。通过对英国Biobank 282,871例英国欧洲血统白人样本的住院数据进行分析,我们发现SPACox可以有效分析大样本量并准确控制I型错误率。我们确定了611个与12种常见疾病的事件表型相关的基因座,

更新日期:2020-08-06
down
wechat
bug