当前位置: X-MOL 学术bioRxiv. Bioinform. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Development of an absolute assignment predictor for triple-negative breast cancer subtyping using machine learning approaches
bioRxiv - Bioinformatics Pub Date : 2020-06-03 , DOI: 10.1101/2020.06.02.129544
Fadoua Ben Azzouz , Bertrand Michel , Hamza Lasla , Wilfried Gouraud , Anne-Flore François , Fabien Girka , Théo Lecointre , Catherine Guérin-Charbonnel , Philippe P. Juin , Mario Campone , Pascal Jézéquel

Triple-negative breast cancer (TNBC) heterogeneity represents one of the main impediment to precision medicine for this disease. Recent concordant transcriptomics studies have shown that TNBC could be splitted into at least three subtypes with potential therapeutic implications. Although, a few studies have been done to predict TNBC subtype by means of transcriptomics data, subtyping was partially sensitive and limited by batch effect and dependence to a given dataset, which may penalize the switch to routine diagnostic testing. Therefore, we sought to build an absolute predictor (i.e. intra-patient diagnosis) based on machine learning algorithm with a limited number of probes. To this end, we started by introducing probe binary comparison for each patient (indicators). We based predictive analysis on this transformed data. Probe selection was first performed by combining both filter and wrapper methods for variable selection using cross validation. We thus tested three prediction models (random forest, gradient boosting [GB] and extreme gradient boosting) using this optimal subset of indicators as inputs. Nested cross-validation allowed us to consistently choose the best model. Results showed that the 50 selected indicators highlighted biological characteristics associated with each TNBC subtype. The GB based on this subset of indicators has better performances as compared to the other models.

中文翻译:

使用机器学习方法开发三阴性乳腺癌亚型的绝对任务预测因子

三阴性乳腺癌(TNBC)异质性是该疾病精密医学的主要障碍之一。近期一致的转录组学研究表明,TNBC可以分为具有潜在治疗意义的至少三种亚型。尽管已经进行了一些研究,通过转录组学数据来预测TNBC亚型,但是亚型是部分敏感的,并且受批处理效应和对给定数据集的依赖性的限制,这可能不利于常规诊断测试的转换。因此,我们寻求基于有限数量的探针的机器学习算法来构建绝对预测因子(即患者内诊断)。为此,我们从介绍每位患者(指标)的探针二元比较开始。我们基于此转换后的数据进行预测分析。首先通过结合使用交叉验证的过滤器和包装器方法进行变量选择来进行探针选择。因此,我们使用指标的最佳子集作为输入,测试了三种预测模型(随机森林,梯度提升[GB]和极端梯度提升)。嵌套的交叉验证使我们能够始终如一地选择最佳模型。结果表明,选择的50个指标突出了与每个TNBC亚型相关的生物学特性。与其他模型相比,基于此子集的指标的GB性能更好。嵌套的交叉验证使我们能够始终如一地选择最佳模型。结果表明,选择的50个指标突出了与每个TNBC亚型相关的生物学特性。与其他模型相比,基于此子集的指标的GB性能更好。嵌套的交叉验证使我们能够始终如一地选择最佳模型。结果表明,选择的50个指标突出了与每个TNBC亚型相关的生物学特性。与其他模型相比,基于此子集的指标的GB性能更好。
更新日期:2020-06-03
down
wechat
bug