当前位置: X-MOL 学术Front. Genet. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Integrating Somatic Mutations for Breast Cancer Survival Prediction Using Machine Learning Methods
Frontiers in Genetics ( IF 2.8 ) Pub Date : 2020-12-30 , DOI: 10.3389/fgene.2020.632901
Zongzhen He , Junying Zhang , Xiguo Yuan , Yuanyuan Zhang

Breast cancer is the most common malignancy in women, and because it has a high mortality rate, it is urgent to develop computational methods to increase the accuracy of breast cancer survival predictive models. Although multi-omics data such as gene expression have been extensively used in recent studies, the accurate prognosis of breast cancer remains a challenge. Somatic mutations are another important and promising data source for studying cancer development, and its effect on the prognosis of breast cancer remains to be further explored. Meanwhile, these omics datasets are high-dimensional and redundant. Therefore, we adopted multiple kernel learning (MKL) to efficiently integrate somatic mutation to currently molecular data including gene expression, copy number variation (CNV), methylation, and protein expression data for the prediction of breast cancer survival. Before integration, the maximum relevance minimum redundancy (mRMR) feature selection method was utilized to select features that present high relevance to survival and low redundancy among themselves for each type of data. The experimental results demonstrated that the proposed method achieved the most optimal performance and there was a remarkable improvement in the prediction performance when somatic mutations were included, indicating that somatic mutations are critical for improving breast cancer survival predictions. Moreover, mRMR was superior to other feature selection methods used in previous studies. Furthermore, MKL outperformed the other traditional classifiers in multi-omics data integration. Our analysis indicated that through employing promising omics data such as somatic mutations and harnessing the power of proper feature selection methods and effective integration frameworks, the breast cancer survival predictive accuracy can be further increased, thereby providing a more optimal clinical diagnosis and more effective treatment for breast cancer patients.



中文翻译:

整合体细胞突变使用机器学习方法进行乳腺癌生存预测

乳腺癌是女性最常见的恶性肿瘤,由于其死亡率高,因此迫切需要开发一种计算方法来提高乳腺癌生存预测模型的准确性。尽管在最近的研究中已经广泛使用了诸如基因表达等多组学数据,但是乳腺癌的准确预后仍然是一个挑战。体细胞突变是研究癌症发展的另一个重要且有希望的数据来源,其对乳腺癌预后的影响仍有待进一步探讨。同时,这些组学数据集具有高维性和冗余性。因此,我们采用了多核学习(MKL),将体细胞突变有效整合到当前的分子数据中,包括基因表达,拷贝数变异(CNV),甲基化,和蛋白质表达数据可预测乳腺癌的存活率。在集成之前,最大相关性最小冗余度(mRMR)特征选择方法用于为每种类型的数据选择与生存相关性高且冗余度低的特征。实验结果表明,所提出的方法在包含体细胞突变的情况下实现了最佳性能,并且在预测性能上有显着提高,表明体细胞突变对于提高乳腺癌的生存预测至关重要。此外,mRMR优于先前研究中使用的其他特征选择方法。此外,在多组学数据集成中,MKL优于其他传统分类器。

更新日期:2021-01-18
down
wechat
bug