当前位置: X-MOL 学术BMC Bioinform. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Comparison of pathway and gene-level models for cancer prognosis prediction.
BMC Bioinformatics ( IF 3 ) Pub Date : 2020-02-28 , DOI: 10.1186/s12859-020-3423-z
Xingyu Zheng 1 , Christopher I Amos 1, 2 , H Robert Frost 1
Affiliation  

BACKGROUND Cancer prognosis prediction is valuable for patients and clinicians because it allows them to appropriately manage care. A promising direction for improving the performance and interpretation of expression-based predictive models involves the aggregation of gene-level data into biological pathways. While many studies have used pathway-level predictors for cancer survival analysis, a comprehensive comparison of pathway-level and gene-level prognostic models has not been performed. To address this gap, we characterized the performance of penalized Cox proportional hazard models built using either pathway- or gene-level predictors for the cancers profiled in The Cancer Genome Atlas (TCGA) and pathways from the Molecular Signatures Database (MSigDB). RESULTS When analyzing TCGA data, we found that pathway-level models are more parsimonious, more robust, more computationally efficient and easier to interpret than gene-level models with similar predictive performance. For example, both pathway-level and gene-level models have an average Cox concordance index of ~ 0.85 for the TCGA glioma cohort, however, the gene-level model has twice as many predictors on average, the predictor composition is less stable across cross-validation folds and estimation takes 40 times as long as compared to the pathway-level model. When the complex correlation structure of the data is broken by permutation, the pathway-level model has greater predictive performance while still retaining superior interpretative power, robustness, parsimony and computational efficiency relative to the gene-level models. For example, the average concordance index of the pathway-level model increases to 0.88 while the gene-level model falls to 0.56 for the TCGA glioma cohort using survival times simulated from uncorrelated gene expression data. CONCLUSION The results of this study show that when the correlations among gene expression values are low, pathway-level analyses can yield better predictive performance, greater interpretative power, more robust models and less computational cost relative to a gene-level model. When correlations among genes are high, a pathway-level analysis provides equivalent predictive power compared to a gene-level analysis while retaining the advantages of interpretability, robustness and computational efficiency.

中文翻译:

用于癌症预后预测的通路和基因水平模型的比较。

背景技术癌症预后预测对于患者和临床医生是有价值的,因为它允许他们适当地管理护理。提高基于表达的预测模型的性能和解释的一个有前途的方向涉及将基因水平的数据聚合到生物通路中。虽然许多研究已使用通路水平预测因子进行癌症生存分析,但尚未对通路水平和基因水平预后模型进行全面比较。为了解决这一差距,我们对使用癌症基因组图谱 (TCGA) 和分子特征数据库 (MSigDB) 中描述的癌症的通路或基因水平预测因子构建的惩罚性 Cox 比例风险模型的性能进行了表征。结果 在分析 TCGA 数据时,我们发现通路级模型更简约,与具有相似预测性能的基因水平模型相比,它更稳健、计算效率更高且更易于解释。例如,对于 TCGA 神经胶质瘤队列,通路水平和基因水平模型的平均 Cox 一致性指数约为 0.85,但是,基因水平模型平均具有两倍多的预测因子,预测因子组成在交叉中不太稳定-validation folds 和 estimation 花费的时间是 pathway-level 模型的 40 倍。当数据的复杂相关结构被排列破坏时,通路级模型具有更好的预测性能,同时相对于基因级模型仍保持优越的解释能力、稳健性、简约性和计算效率。例如,路径级模型的平均一致性指数增加到 0。88 而使用从不相关的基因表达数据模拟的生存时间的 TCGA 神经胶质瘤队列的基因水平模型降至 0.56。结论 本研究的结果表明,当基因表达值之间的相关性较低时,与基因水平模型相比,通路水平分析可以产生更好的预测性能、更强的解释能力、更稳健的模型和更少的计算成本。当基因之间的相关性很高时,通路水平分析提供与基因水平分析相当的预测能力,同时保留可解释性、稳健性和计算效率的优势。结论 本研究的结果表明,当基因表达值之间的相关性较低时,与基因水平模型相比,通路水平分析可以产生更好的预测性能、更强的解释能力、更稳健的模型和更少的计算成本。当基因之间的相关性很高时,通路水平分析提供与基因水平分析相当的预测能力,同时保留可解释性、稳健性和计算效率的优势。结论 本研究的结果表明,当基因表达值之间的相关性较低时,与基因水平模型相比,通路水平分析可以产生更好的预测性能、更强的解释能力、更稳健的模型和更少的计算成本。当基因之间的相关性很高时,通路水平分析提供与基因水平分析相当的预测能力,同时保留可解释性、稳健性和计算效率的优势。
更新日期:2020-02-28
down
wechat
bug