当前位置: X-MOL 学术BMC Bioinform. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Robust edge-based biomarker discovery improves prediction of breast cancer metastasis
BMC Bioinformatics ( IF 2.9 ) Pub Date : 2020-09-30 , DOI: 10.1186/s12859-020-03692-2
Nahim Adnan , Chengwei Lei , Jianhua Ruan

The abundance of molecular profiling of breast cancer tissues entailed active research on molecular marker-based early diagnosis of metastasis. Recently there is a surging interest in combining gene expression with gene networks such as protein-protein interaction (PPI) network, gene co-expression (CE) network and pathway information to identify robust and accurate biomarkers for metastasis prediction, reflecting the common belief that cancer is a systems biology disease. However, controversy exists in the literature regarding whether network markers are indeed better features than genes alone for predicting as well as understanding metastasis. We believe much of the existing results may have been biased by the overly complicated prediction algorithms, unfair evaluation, and lack of rigorous statistics. In this study, we propose a simple approach to use network edges as features, based on two types of networks respectively, and compared their prediction power using three classification algorithms and rigorous statistical procedure on one of the largest datasets available. To detect biomarkers that are significant for the prediction and to compare the robustness of different feature types, we propose an unbiased and novel procedure to measure feature importance that eliminates the potential bias from factors such as different sample size, number of features, as well as class distribution. Experimental results reveal that edge-based feature types consistently outperformed gene-based feature type in random forest and logistic regression models under all performance evaluation metrics, while the prediction accuracy of edge-based support vector machine (SVM) model was poorer, due to the larger number of edge features compared to gene features and the lack of feature selection in SVM model. Experimental results also show that edge features are much more robust than gene features and the top biomarkers from edge feature types are statistically more significantly enriched in the biological processes that are well known to be related to breast cancer metastasis. Overall, this study validates the utility of edge features as biomarkers but also highlights the importance of carefully designed experimental procedures in order to achieve statistically reliable comparison results.

中文翻译:

强大的基于边缘的生物标志物发现改善了乳腺癌转移的预测

乳腺癌组织的分子谱分析的丰富性使人们需要对基于分子标记的转移早期诊断进行积极的研究。最近,人们对将基因表达与基因网络(如蛋白质-蛋白质相互作用(PPI)网络,基因共表达(CE)网络和途径信息)结合以鉴定用于转移预测的可靠且准确的生物标志物的兴趣日益浓厚,这反映出人们普遍认为癌症是一种系统生物学疾病。但是,关于网络标志物是否确实比单独的基因在预测和理解转移方面具有更好的功能,文献中存在争议。我们认为,大多数现有结果可能由于过于复杂的预测算法,不公正的评估以及缺乏严格的统计数据而出现偏差。在这个研究中,我们提出了一种简单的方法,分别基于两种类型的网络,将网络边缘用作特征,并使用三种分类算法和严格的统计程序,在一个最大的数据集上比较了它们的预测能力。为了检测对预测具有重要意义的生物标记并比较不同特征类型的鲁棒性,我们提出了一种无偏见的新颖方法来测量特征重要性,从而消除了诸如不同样本量,特征数量以及班级分布。实验结果表明,在所有性能评估指标下,随机森林和逻辑回归模型中基于边缘的特征类型始终优于基于基因的特征类型,基于边缘的支持向量机(SVM)模型的预测精度较差,这是因为与基因特征相比,边缘特征的数量更多,并且SVM模型缺乏特征选择。实验结果还表明,边缘特征比基因特征更健壮,并且边缘特征类型中的顶级生物标志物在统计学上显着地丰富了众所周知的与乳腺癌转移有关的生物过程。总的来说,这项研究验证了边缘特征作为生物标记物的实用性,但同时也强调了精心设计的实验程序对于获得统计上可靠的比较结果的重要性。实验结果还表明,边缘特征比基因特征更健壮,并且边缘特征类型中的顶级生物标志物在统计学上显着地丰富了众所周知的与乳腺癌转移有关的生物过程。总的来说,这项研究验证了边缘特征作为生物标记物的实用性,但同时也强调了精心设计的实验程序对于获得统计上可靠的比较结果的重要性。实验结果还表明,边缘特征比基因特征更健壮,并且边缘特征类型中的顶级生物标志物在统计学上显着地丰富了众所周知的与乳腺癌转移有关的生物过程。总的来说,这项研究验证了边缘特征作为生物标记物的实用性,但同时也强调了精心设计的实验程序对于获得统计上可靠的比较结果的重要性。
更新日期:2020-09-30
down
wechat
bug