当前位置: X-MOL 学术BMC Med. Genomics › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Comparative evaluation of network features for the prediction of breast cancer metastasis
BMC Medical Genomics ( IF 2.7 ) Pub Date : 2020-04-03 , DOI: 10.1186/s12920-020-0676-3
Nahim Adnan , Zhijie Liu , Tim H.M. Huang , Jianhua Ruan

Discovering a highly accurate and robust gene signature for the prediction of breast cancer metastasis from gene expression profiling of primary tumors is one of the most challenging tasks to reduce the number of deaths in women. Due to the limited success of gene-based features in achieving satisfactory prediction accuracy, many methodologies have been proposed in recent years to develop network-based features by integrating network information with gene expression. However, evaluation results are inconsistent to confirm the effectiveness of network-based features, because of many confounding factors involved in classification model learning process, such as data normalization, dimension reduction, and feature selection. An unbiased comparative evaluation is essential for uncovering the strength of network-based features. In this study, we compared several types of network-based features obtained using different mathematical operators (Mean, Maximum, Minimum, Median, Variance) on geneset (i.e., a gene and its’ neighbors in the network) in protein-protein interaction network and gene co-expression network for their ability in predicting breast cancer metastasis using gene expression data from more than 10 patient cohorts. While network-based features are usually statistically more significant than gene-based feature, a consistent improvement of prediction performance using network-based features requires a substantial number of patients in the dataset. In contrary to many previous reports, no evidence was found to support the robustness of network-based features and we argue some of the robustness may be due to the inherent bias associated with node degree in the network. In addition, different types of network features seem to cover different pathways and are complementary to each other. Consequently, an ensemble classifier combining different network features was proposed and was found to significantly outperform classifiers based on gene-based feature or any single type of network-based features. Network-based features and their combination show promise for improving the prediction of breast cancer metastasis but may require a large amount of training data. Robustness claim of network-based features needs to be re-examined with network node degree and other confounding factors in consideration.

中文翻译:

网络特征对乳腺癌转移预测的比较评价

从原发性肿瘤的基因表达谱中发现高度准确和强大的基因特征来预测乳腺癌的转移是减少妇女死亡人数的最具挑战性的任务之一。由于基于基因的特征在实现令人满意的预测精度方面取得的成功有限,因此近年来提出了许多方法来通过将网络信息与基因表达整合来开发基于网络的特征。但是,由于分类模型学习过程中涉及许多混杂因素,例如数据归一化,降维和特征选择,因此评估结果无法确认基于网络的特征的有效性。公正的比较评估对于揭示基于网络的功能的强度至关重要。在这个研究中,我们比较了在蛋白质-蛋白质相互作用网络和基因协同分析中使用不同数学运算符(均值,最大值,最小值,中位数,方差)对基因集(即,一个基因及其在网络中的邻居)获得的几种基于网络的特征。表达网络使用来自10多个患者队列的基因表达数据预测乳腺癌转移的能力。尽管基于网络的功能通常在统计上比基于基因的功能更重要,但使用基于网络的功能来持续提高预测性能需要数据集中大量患者。与许多以前的报告相反,没有证据支持基于网络的功能的健壮性,我们认为某些健壮性可能是由于与网络中节点程度相关的固有偏差所致。此外,不同类型的网络功能似乎涵盖了不同的路径,并且彼此互补。因此,提出了一种结合了不同网络特征的整体分类器,发现该分类器明显优于基于基因的特征或任何单一类型的基于网络的特征的分类器。基于网络的功能及其组合显示出有望改善乳腺癌转移的预测,但可能需要大量的培训数据。需要重新考虑基于网络的功能的健壮性要求,并考虑网络节点的程度和其他混杂因素。提出了结合不同网络特征的整体分类器,发现该分类器明显优于基于基因的特征或任何单一类型的基于网络的特征的分类器。基于网络的功能及其组合显示出有望改善乳腺癌转移的预测,但可能需要大量的培训数据。需要重新考虑基于网络的功能的健壮性要求,并考虑网络节点的程度和其他混杂因素。提出了结合不同网络特征的集成分类器,发现该分类器明显优于基于基因的特征或任何单一类型的基于网络的特征的分类器。基于网络的功能及其组合显示出有望改善乳腺癌转移的预测,但可能需要大量的培训数据。需要重新考虑基于网络的功能的健壮性要求,并考虑网络节点的程度和其他混杂因素。
更新日期:2020-04-22
down
wechat
bug