当前位置: X-MOL 学术BMC Med. Genomics › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Improving lung cancer risk stratification leveraging whole transcriptome RNA sequencing and machine learning across multiple cohorts
BMC Medical Genomics ( IF 2.1 ) Pub Date : 2020-10-22 , DOI: 10.1186/s12920-020-00782-1
Yoonha Choi 1 , Jianghan Qu 1 , Shuyang Wu 1 , Yangyang Hao 1 , Jiarui Zhang 2 , Jianchang Ning 1 , Xinwu Yang 1 , Lori Lofaro 1 , Daniel G Pankratz 1 , Joshua Babiarz 1 , P Sean Walsh 1 , Ehab Billatos 2 , Marc E Lenburg 2 , Giulia C Kennedy 1 , Jon McAuliffe 3 , Jing Huang 1
Affiliation  

Bronchoscopy for suspected lung cancer has low diagnostic sensitivity, rendering many inconclusive results. The Bronchial Genomic Classifier (BGC) was developed to help with patient management by identifying those with low risk of lung cancer when bronchoscopy is inconclusive. The BGC was trained and validated on patients in the Airway Epithelial Gene Expression in the Diagnosis of Lung Cancer (AEGIS) trials. A modern patient cohort, the BGC Registry, showed differences in key clinical factors from the AEGIS cohorts, with less smoking history, smaller nodules and older age. Additionally, we discovered interfering factors (inhaled medication and sample collection timing) that impacted gene expressions and potentially disguised genomic cancer signals. In this study, we leveraged multiple cohorts and next generation sequencing technology to develop a robust Genomic Sequencing Classifier (GSC). To address demographic composition shift and interfering factors, we synergized three algorithmic strategies: 1) ensemble of clinical dominant and genomic dominant models; 2) development of hierarchical regression models where the main effects from clinical variables were regressed out prior to the genomic impact being fitted in the model; and 3) targeted placement of genomic and clinical interaction terms to stabilize the effect of interfering factors. The final GSC model uses 1232 genes and four clinical covariates – age, pack-years, inhaled medication use, and specimen collection timing. In the validation set (N = 412), the GSC down-classified low and intermediate pre-test risk subjects to very low and low post-test risk with a specificity of 45% (95% CI 37–53%) and a sensitivity of 91% (95%CI 81–97%), resulting in a negative predictive value of 95% (95% CI 89–98%). Twelve percent of intermediate pre-test risk subjects were up-classified to high post-test risk with a positive predictive value of 65% (95%CI 44–82%), and 27% of high pre-test risk subjects were up-classified to very high post-test risk with a positive predictive value of 91% (95% CI 78–97%). The GSC overcame the impact of interfering factors and achieved consistent performance across multiple cohorts. It demonstrated diagnostic accuracy in both down- and up-classification of cancer risk, providing physicians actionable information for many patients with inconclusive bronchoscopy.

中文翻译:

利用跨多个队列的全转录组 RNA 测序和机器学习改善肺癌风险分层

疑似肺癌的支气管镜检查诊断敏感性低,导致许多结果不确定。支气管基因组分类器 (BGC) 的开发是为了在支气管镜检查结果不确定时识别肺癌风险较低的患者,从而帮助患者管理。BGC 在肺癌诊断中的气道上皮基因表达 (AEGIS) 试验中接受了患者培训和验证。BGC 登记处的现代患者队列显示,关键临床因素与 AEGIS 队列存在差异,吸烟史较少,结节较小且年龄较大。此外,我们还发现了影响基因表达和潜在伪装的基因组癌症信号的干扰因素(吸入药物和样本采集时间)。在这项研究中,我们利用多个群组和下一代测序技术开发了强大的基因组测序分类器 (GSC)。为了解决人口组成变化和干扰因素,我们协同了三种算法策略:1)临床显性和基因组显性模型的集合;2) 层次回归模型的开发,在模型中拟合基因组影响之前,先对临床变量的主要影响进行回归;3) 有针对性地放置基因组和临床相互作用项,以稳定干扰因素的影响。最终的 GSC 模型使用 1232 个基因和四个临床协变量——年龄、包年、吸入药物使用和标本采集时间。在验证集中 (N = 412),GSC 将测试前低风险和中等风险受试者降级为极低和低测试后风险,特异性为 45%(95% CI 37-53%),灵敏度为 91%(95%CI 81-97 %),导致阴性预测值为 95% (95% CI 89–98%)。12% 的中等预测试风险受试者被升级为高测试后风险,阳性预测值为 65% (95%CI 44–82%),27% 的高预测试风险受试者被升级为分类为非常高的测试后风险,阳性预测值为 91%(95% CI 78–97%)。GSC 克服了干扰因素的影响,并在多个队列中取得了一致的表现。它证明了癌症风险向下和向上分类的诊断准确性,为许多支气管镜检查结果不确定的患者提供了医生可操作的信息。
更新日期:2020-10-26
down
wechat
bug