当前位置: X-MOL 学术Biol. Direct › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Integration of multiple types of genetic markers for neuroblastoma may contribute to improved prediction of the overall survival.
Biology Direct ( IF 5.5 ) Pub Date : 2018-09-20 , DOI: 10.1186/s13062-018-0222-9
Aneta Polewko-Klim 1 , Wojciech Lesiński 1 , Krzysztof Mnich 2 , Radosław Piliszek 2 , Witold R Rudnicki 1, 2, 3
Affiliation  

BACKGROUND Modern experimental techniques deliver data sets containing profiles of tens of thousands of potential molecular and genetic markers that can be used to improve medical diagnostics. Previous studies performed with three different experimental methods for the same set of neuroblastoma patients create opportunity to examine whether augmenting gene expression profiles with information on copy number variation can lead to improved predictions of patients survival. We propose methodology based on comprehensive cross-validation protocol, that includes feature selection within cross-validation loop and classification using machine learning. We also test dependence of results on the feature selection process using four different feature selection methods. RESULTS The models utilising features selected based on information entropy are slightly, but significantly, better than those using features obtained with t-test. The synergy between data on genetic variation and gene expression is possible, but not confirmed. A slight, but statistically significant, increase of the predictive power of machine learning models has been observed for models built on combined data sets. It was found while using both out of bag estimate and in cross-validation performed on a single set of variables. However, the improvement was smaller and non-significant when models were built within full cross-validation procedure that included feature selection within cross-validation loop. Good correlation between performance of the models in the internal and external cross-validation was observed, confirming the robustness of the proposed protocol and results. CONCLUSIONS We have developed a protocol for building predictive machine learning models. The protocol can provide robust estimates of the model performance on unseen data. It is particularly well-suited for small data sets. We have applied this protocol to develop prognostic models for neuroblastoma, using data on copy number variation and gene expression. We have shown that combining these two sources of information may increase the quality of the models. Nevertheless, the increase is small and larger samples are required to reduce noise and bias arising due to overfitting. REVIEWERS This article was reviewed by Lan Hu, Tim Beissbarth and Dimitar Vassilev.

中文翻译:

神经母细胞瘤的多种类型的遗传标记的整合可能有助于改善整体生存的预测。

背景技术现代实验技术提供了包含数以万计的潜在分子和遗传标记的概况的数据集,其可用于改善医学诊断。对一组成神经细胞瘤患者使用三种不同的实验方法进行的先前研究创造了机会,以检查具有拷贝数变异信息的增强基因表达谱是否可以改善患者的生存预测。我们提出了基于全面交叉验证协议的方法,该方法包括交叉验证循环内的特征选择以及使用机器学习进行分类。我们还使用四种不同的特征选择方法测试结果对特征选择过程的依赖性。结果利用基于信息熵选择的特征的模型略有不同,但明显优于使用t检验获得的特征的产品。遗传变异和基因表达数据之间的协同作用是可能的,但尚未得到证实。对于基于组合数据集的模型,已经观察到机器学习模型的预测能力略有提高,但在统计意义上显着。在使用袋外估计和对一组变量进行交叉验证时发现。但是,当在完全交叉验证过程中构建模型(包括在交叉验证循环中进行特征选择)时,改进幅度较小且不显着。在内部和外部交叉验证中,模型的性能之间具有良好的相关性,从而证实了所提出协议和结果的鲁棒性。结论我们已经开发了用于建立预测性机器学习模型的协议。该协议可以对看不见的数据提供模型性能的可靠估计。它特别适合于小型数据集。我们已经使用该协议开发了神经母细胞瘤的预后模型,使用了拷贝数变异和基因表达的数据。我们已经表明,结合这两种信息源可以提高模型的质量。尽管如此,增加的幅度很小,并且需要较大的样本以减少由于过度拟合而产生的噪声和偏差。审阅者本文由Lan Hu,Tim Beissbarth和Dimitar Vassilev撰写。我们已经使用该协议开发了神经母细胞瘤的预后模型,使用了拷贝数变异和基因表达的数据。我们已经表明,结合这两种信息源可以提高模型的质量。尽管如此,增加的幅度很小,并且需要较大的样本以减少由于过度拟合而产生的噪声和偏差。审阅者本文由Lan Hu,Tim Beissbarth和Dimitar Vassilev撰写。我们已经使用该协议开发了神经母细胞瘤的预后模型,使用了拷贝数变异和基因表达的数据。我们已经表明,结合这两种信息源可以提高模型的质量。尽管如此,增加的幅度很小,并且需要较大的样本以减少由于过度拟合而产生的噪声和偏差。审阅者本文由Lan Hu,Tim Beissbarth和Dimitar Vassilev撰写。
更新日期:2020-04-22
down
wechat
bug