当前位置: X-MOL 学术Am. J. Hum. Genet. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
UK Biobank Whole-Exome Sequence Binary Phenome Analysis with Robust Region-Based Rare-Variant Test.
American Journal of Human Genetics ( IF 8.1 ) Pub Date : 2019-12-19 , DOI: 10.1016/j.ajhg.2019.11.012
Zhangchen Zhao 1 , Wenjian Bi 1 , Wei Zhou 2 , Peter VandeHaar 1 , Lars G Fritsche 1 , Seunggeun Lee 1
Affiliation  

In biobank data analysis, most binary phenotypes have unbalanced case-control ratios, and this can cause inflation of type I error rates. Recently, a saddle point approximation (SPA) based single-variant test has been developed to provide an accurate and scalable method to test for associations of such phenotypes. For gene- or region-based multiple-variant tests, a few methods exist that can adjust for unbalanced case-control ratios; however, these methods are either less accurate when case-control ratios are extremely unbalanced or not scalable for large data analyses. To address these problems, we propose SKAT- and SKAT-O- type region-based tests; in these tests, the single-variant score statistic is calibrated based on SPA and efficient resampling (ER). Through simulation studies, we show that the proposed method provides well-calibrated p values. In contrast, when the case-control ratio is 1:99, the unadjusted approach has greatly inflated type I error rates (90 times that of exome-wide sequencing α = 2.5 × 10-6). Additionally, the proposed method has similar computation time to the unadjusted approaches and is scalable for large sample data. In our application, the UK Biobank whole-exome sequence data analysis of 45,596 unrelated European samples and 791 PheCode phenotypes identified 10 rare-variant associations with p value < 10-7, including the associations between JAK2 and myeloproliferative disease, HOXB13 and cancer of prostate, and F11 and congenital coagulation defects. All analysis summary results are publicly available through a web-based visual server, and this availability can help facilitate the identification of the genetic basis of complex diseases.

中文翻译:


英国生物银行全外显子组序列二元表型组分析与稳健的基于区域的稀有变异测试。



在生物样本库数据分析中,大多数二元表型的病例对照比不平衡,这可能导致 I 类错误率膨胀。最近,开发了基于鞍点近似(SPA)的单变异测试,以提供准确且可扩展的方法来测试此类表型的关联。对于基于基因或区域的多变异测试,存在一些可以调整不平衡的病例对照比的方法;然而,当病例对照率极度不平衡时,这些方法要么不太准确,要么无法扩展用于大数据分析。为了解决这些问题,我们提出了 SKAT 和 SKAT-O 类型的基于区域的测试;在这些测试中,单变量得分统计量是基于 SPA 和高效重采样 (ER) 进行校准的。通过模拟研究,我们表明所提出的方法提供了经过良好校准的 p 值。相比之下,当病例对照比为1:99时,未经调整的方法大大提高了I型错误率(全外显子组测序的90倍α = 2.5 × 10-6)。此外,所提出的方法与未调整的方法具有相似的计算时间,并且对于大样本数据是可扩展的。在我们的应用中,英国生物银行对 45,596 个不相关的欧洲样本和 791 个 PheCode 表型的全外显子组序列数据分析确定了 10 个 p 值 < 10-7 的罕见变异关联,包括 JAK2 与骨髓增殖性疾病、HOXB13 和前列腺癌之间的关联、F11 和先天性凝血缺陷。所有分析摘要结果均可通过基于网络的可视化服务器公开获得,这种可用性有助于促进复杂疾病遗传基础的识别。
更新日期:2019-12-19
down
wechat
bug