当前位置: X-MOL 学术BMJ › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Integrating genome-wide polygenic risk scores and non-genetic risk to predict colorectal cancer diagnosis using UK Biobank data: population based cohort study
The BMJ ( IF 93.6 ) Pub Date : 2022-11-09 , DOI: 10.1136/bmj-2022-071707
Sarah E W Briggs 1 , Philip Law 2 , James E East 3, 4 , Sarah Wordsworth 4, 5 , Malcolm Dunlop 6 , Richard Houlston 2 , Julia Hippisley-Cox 7 , Ian Tomlinson 8
Affiliation  

Objective To evaluate the benefit of combining polygenic risk scores with the QCancer-10 (colorectal cancer) prediction model for non-genetic risk to identify people at highest risk of colorectal cancer. Design Population based cohort study. Setting Data from the UK Biobank study, collected between March 2006 and July 2010. Participants 434 587 individuals with complete data for genetics and QCancer-10 predictions were included in the QCancer-10 plus polygenic risk score modelling and validation cohorts. Main outcome measures Prediction of colorectal cancer diagnosis by genetic, non-genetic, and combined risk models. Using data from UK Biobank, six different polygenic risk scores for colorectal cancer were developed using LDpred2 polygenic risk score software, clumping, and thresholding approaches, and a model based on genome-wide significant polymorphisms. The top performing genome-wide polygenic risk score and the score containing genome-wide significant polymorphisms were combined with QCancer-10 and performance was compared with QCancer-10 alone. Case-control (logistic regression) and time-to-event (Cox proportional hazards) analyses were used to evaluate risk model performance in men and women. Results Polygenic risk scores derived using the LDpred2 program performed best, with an odds ratio per standard deviation of 1.584 (95% confidence interval 1.536 to 1.633), and top age and sex adjusted C statistic of 0.733 (95% confidence interval 0.710 to 0.753) in logistic regression models in the validation cohort. Integrated QCancer-10 plus polygenic risk score models out-performed QCancer-10 alone. In men, the integrated LDpred2 model produced a C statistic of 0.730 (0.720 to 0.741) and explained variation of 28.2% (26.3 to 30.1), compared with 0.693 (0.682 to 0.704) and 21.0% (18.9 to 23.1) for QCancer-10 alone. In women, the C statistic for the integrated LDpred2 model was 0.687 (0.673 to 0.702) and explained variation was 21.0% (18.7 to 23.7), compared with 0.645 (0.631 to 0.659) and 12.4% (10.3 to 14.6) for QCancer-10 alone. In the top 20% of individuals at highest absolute risk, the sensitivity and specificity of the integrated LDpred2 models for predicting colorectal cancer diagnosis was 47.8% and 80.3% respectively in men, and 42.7% and 80.1% respectively in women, with increases in absolute risk in the top 5% of risk in men of 3.47-fold and in women of 2.77-fold compared with the median. Illustrative decision curve analysis indicated a small incremental improvement in net benefit with QCancer-10 plus polygenic risk score models compared with QCancer-10 alone. Conclusions Integrating polygenic risk scores with QCancer-10 modestly improves risk prediction over use of QCancer-10 alone. Given that QCancer-10 data can be obtained relatively easily from health records, use of polygenic risk score in risk stratified population screening for colorectal cancer currently has no clear justification. The added benefit, cost effectiveness, and acceptability of polygenic risk scores should be carefully evaluated in a real life screening setting before implementation in the general population. UK Biobank data can be obtained through . Genotype data are available in the European Genome-phenome Archive under accession numbers EGAS00001005412, EGAS00001005421, or from the Edinburgh University DataShare Repository (). Finnish cohort samples can be requested from the THL Biobank . PRS single nucleotide polymorphism inclusion lists and model specifications will be deposited in the PGS catalogue repository (). Risk scores for UK Biobank study participants will be returned to UK Biobank for use by approved researchers.

中文翻译:


使用英国生物银行数据整合全基因组多基因风险评分和非遗传风险来预测结直肠癌诊断:基于人群的队列研究



目的 评估将多基因风险评分与 QCancer-10(结直肠癌)非遗传风险预测模型相结合以确定结直肠癌风险最高的人群的益处。设计基于人群的队列研究。设置数据来自英国生物银行研究,于 2006 年 3 月至 2010 年 7 月期间收集。参与者 434 587 名拥有完整遗传学和 QCancer-10 预测数据的个体被纳入 QCancer-10 加多基因风险评分建模和验证队列中。主要结果指标通过遗传、非遗传和组合风险模型预测结直肠癌诊断。利用英国生物银行的数据,使用 LDpred2 多基因风险评分软件、聚类和阈值方法以及基于全基因组显着多态性的模型,开发了六种不同的结直肠癌多基因风险评分。将表现最好的全基因组多基因风险评分和包含全基因组显着多态性的评分与 QCancer-10 相结合,并将性能与单独的 QCancer-10 进行比较。使用病例对照(逻辑回归)和事件发生时间(Cox 比例风险)分析来评估男性和女性风险模型的表现。结果 使用 LDpred2 程序得出的多基因风险评分表现最佳,每标准差的比值比为 1.584(95% 置信区间 1.536 至 1.633),最高年龄和性别调整的 C 统计量为 0.733(95% 置信区间 0.710 至 0.753)在验证队列中的逻辑回归模型中。集成 QCancer-10 加上多基因风险评分模型的表现优于单独的 QCancer-10。在男性中,集成 LDpred2 模型产生的 C 统计值为 0.730(0.720 至 0.741),解释了 28.2%(26.3 至 30.1)的变异,而解释的变异为 0.693(0.682 至 0.741)。704) 和单独 QCancer-10 的 21.0% (18.9 至 23.1)。在女性中,集成 LDpred2 模型的 C 统计量为 0.687(0.673 至 0.702),解释变异为 21.0%(18.7 至 23.7),而 QCancer-10 的 C 统计量为 0.645(0.631 至 0.659)和 12.4%(10.3 至 14.6)独自的。在绝对风险最高的 20% 个体中,集成 LDpred2 模型预测结直肠癌诊断的敏感性和特异性在男性中分别为 47.8% 和 80.3%,在女性中分别为 42.7% 和 80.1%,绝对风险增加与中位数相比,前 5% 风险的男性为 3.47 倍,女性为 2.77 倍。说明性决策曲线分析表明,与单独使用 QCancer-10 相比,QCancer-10 加多基因风险评分模型的净效益略有增量改善。结论 与单独使用 QCancer-10 相比,将多基因风险评分与 QCancer-10 相结合可适度改善风险预测。鉴于 QCancer-10 数据可以相对容易地从健康记录中获得,目前在结直肠癌风险分层人群筛查中使用多基因风险评分尚无明确的理由。在对普通人群实施之前,应在现实生活中的筛查环境中仔细评估多基因风险评分的附加效益、成本效益和可接受性。英国生物银行数据可以通过以下方式获取。基因型数据可在欧洲基因组-表型档案馆中获取,登录号为 EGAS00001005412、EGAS00001005421,或来自爱丁堡大学数据共享存储库( )。可向 THL 生物银行索取芬兰队列样本。 PRS单核苷酸多态性包含列表和模型规范将存放在PGS目录存储库中( )。英国生物银行研究参与者的风险评分将返回英国生物银行,供经批准的研究人员使用。
更新日期:2022-11-09
down
wechat
bug