当前位置: X-MOL 学术BMC Bioinform. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Leveraging TCGA gene expression data to build predictive models for cancer drug response
BMC Bioinformatics ( IF 2.9 ) Pub Date : 2020-09-30 , DOI: 10.1186/s12859-020-03690-4
Evan A Clayton 1 , Toyya A Pujol 2 , John F McDonald 1 , Peng Qiu 3
Affiliation  

Machine learning has been utilized to predict cancer drug response from multi-omics data generated from sensitivities of cancer cell lines to different therapeutic compounds. Here, we build machine learning models using gene expression data from patients’ primary tumor tissues to predict whether a patient will respond positively or negatively to two chemotherapeutics: 5-Fluorouracil and Gemcitabine. We focused on 5-Fluorouracil and Gemcitabine because based on our exclusion criteria, they provide the largest numbers of patients within TCGA. Normalized gene expression data were clustered and used as the input features for the study. We used matching clinical trial data to ascertain the response of these patients via multiple classification methods. Multiple clustering and classification methods were compared for prediction accuracy of drug response. Clara and random forest were found to be the best clustering and classification methods, respectively. The results show our models predict with up to 86% accuracy; despite the study’s limitation of sample size. We also found the genes most informative for predicting drug response were enriched in well-known cancer signaling pathways and highlighted their potential significance in chemotherapy prognosis. Primary tumor gene expression is a good predictor of cancer drug response. Investment in larger datasets containing both patient gene expression and drug response is needed to support future work of machine learning models. Ultimately, such predictive models may aid oncologists with making critical treatment decisions.

中文翻译:

利用 TCGA 基因表达数据建立癌症药物反应的预测模型

机器学习已被用来根据癌细胞系对不同治疗化合物的敏感性生成的多组学数据来预测癌症药物反应。在这里,我们使用患者原发肿瘤组织的基因表达数据构建机器学习模型,以预测患者对两种化疗药物:5-氟尿嘧啶和吉西他滨是否会产生积极或消极的反应。我们重点关注 5-氟尿嘧啶和吉西他滨,因为根据我们的排除标准,它们提供了 TCGA 中最多的患者数量。归一化的基因表达数据被聚类并用作研究的输入特征。我们使用匹配的临床试验数据通过多种分类方法来确定这些患者的反应。比较了多种聚类和分类方法的药物反应预测准确性。克拉拉和随机森林分别被认为是最好的聚类和分类方法。结果显示我们的模型预测准确率高达 86%;尽管该研究的样本量有限。我们还发现,对于预测药物反应信息最丰富的基因富含众所周知的癌症信号通路,并强调了它们在化疗预后中的潜在重要性。原发性肿瘤基因表达是癌症药物反应的良好预测因子。需要对包含患者基因表达和药物反应的更大数据集进行投资,以支持机器学习模型的未来工作。最终,这样的预测模型可以帮助肿瘤学家做出关键的治疗决策。
更新日期:2020-09-30
down
wechat
bug