当前位置: X-MOL 学术BMC Cancer › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Bayesian copy number detection and association in large-scale studies.
BMC Cancer ( IF 3.8 ) Pub Date : 2020-09-07 , DOI: 10.1186/s12885-020-07304-3
Stephen Cristiano 1 , David McKean 2 , Jacob Carey 1 , Paige Bracci 3 , Paul Brennan 4 , Michael Chou 5 , Mengmeng Du 6 , Steven Gallinger 7 , Michael G Goggins 8, 9 , Manal M Hassan 10 , Rayjean J Hung 7 , Robert C Kurtz 11 , Donghui Li 12 , Lingeng Lu 13 , Rachel Neale 14 , Sara Olson 6 , Gloria Petersen 15 , Kari G Rabe 15 , Jack Fu 1 , Harvey Risch 13 , Gary L Rosner 1, 10 , Ingo Ruczinski 1 , Alison P Klein 2, 5, 9 , Robert B Scharpf 1, 2
Affiliation  

Germline copy number variants (CNVs) increase risk for many diseases, yet detection of CNVs and quantifying their contribution to disease risk in large-scale studies is challenging due to biological and technical sources of heterogeneity that vary across the genome within and between samples. We developed an approach called CNPBayes to identify latent batch effects in genome-wide association studies involving copy number, to provide probabilistic estimates of integer copy number across the estimated batches, and to fully integrate the copy number uncertainty in the association model for disease. Applying a hidden Markov model (HMM) to identify CNVs in a large multi-site Pancreatic Cancer Case Control study (PanC4) of 7598 participants, we found CNV inference was highly sensitive to technical noise that varied appreciably among participants. Applying CNPBayes to this dataset, we found that the major sources of technical variation were linked to sample processing by the centralized laboratory and not the individual study sites. Modeling the latent batch effects at each CNV region hierarchically, we developed probabilistic estimates of copy number that were directly incorporated in a Bayesian regression model for pancreatic cancer risk. Candidate associations aided by this approach include deletions of 8q24 near regulatory elements of the tumor oncogene MYC and of Tumor Suppressor Candidate 3 (TUSC3). Laboratory effects may not account for the major sources of technical variation in genome-wide association studies. This study provides a robust Bayesian inferential framework for identifying latent batch effects, estimating copy number, and evaluating the role of copy number in heritable diseases.

中文翻译:

大规模研究中的贝叶斯拷贝数检测和关联。

生殖系拷贝数变异 (CNV) 会增加许多疾病的风险,但由于样本内和样本之间的基因组异质性的生物学和技术来源不同,因此在大规模研究中检测 CNV 并量化其对疾病风险的贡献具有挑战性。我们开发了一种称为 CNPBayes 的方法来识别涉及拷贝数的全基因组关联研究中的潜在批次效应,提供估计批次中整数拷贝数的概率估计,并在疾病关联模型中完全整合拷贝数不确定性。在 7598 名参与者的大型多站点胰腺癌病例对照研究 (PanC4) 中应用隐马尔可夫模型 (HMM) 来识别 CNV,我们发现 CNV 推断对技术噪声高度敏感,这些噪声在参与者之间存在明显差异。将 CNPBayes 应用于该数据集,我们发现技术变异的主要来源与集中实验室的样本处理有关,而不是与个别研究地点有关。对每个 CNV 区域的潜在批次效应进行分层建模,我们开发了拷贝数的概率估计,这些估计直接纳入了胰腺癌风险的贝叶斯回归模型。这种方法辅助的候选关联包括在肿瘤癌基因 MYC 和肿瘤抑制因子候选 3 (TUSC3) 的调控元件附近的 8q24 缺失。实验室效应可能无法解释全基因组关联研究中技术变异的主要来源。这项研究提供了一个强大的贝叶斯推理框架,用于识别潜在的批次效应、估计拷贝数、
更新日期:2020-09-08
down
wechat
bug