当前位置: X-MOL 学术BMC Bioinform. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Enhancing SVM for survival data using local invariances and weighting.
BMC Bioinformatics ( IF 2.9 ) Pub Date : 2020-05-19 , DOI: 10.1186/s12859-020-3481-2
Hector Sanz 1 , Ferran Reverter 1, 2 , Clarissa Valim 3, 4
Affiliation  

BACKGROUND The necessity to analyze medium-throughput data in epidemiological studies with small sample size, particularly when studying biomedical data may hinder the use of classical statistical methods. Support vector machines (SVM) models can be successfully applied in this setting because they are a powerful tool to analyze data with large number of predictors and limited sample size, especially when handling binary outcomes. However, biomedical research often involves analysis of time-to-event outcomes and has to account for censoring. Methods to handle censored data in the SVM framework can be divided into two classes: those based on support vector regression (SVR) and those based on binary classification. Methods based on SVR seem to be suboptimal to handle sparse data and yield results comparable to Cox proportional hazards model and kernel Cox regression. The limited work dedicated to assess methods based on of SVM for binary classification has been based on SVM learning using privileged information and SVM with uncertain classes. RESULTS This paper proposes alternative methods and extensions within the binary classification framework, specifically, a conditional survival approach for weighting censored observations and a semi-supervised SVM with local invariances. Using simulation studies and some real datasets, we evaluate those two methods and compare them with a weighted SVM model, SVM extensions found in the literature, kernel Cox regression and Cox model. CONCLUSIONS Our proposed methods perform generally better under a wide variety of realistic scenarios about the structure of biomedical data. Specifically, the local invariances method using the conditional survival approach is the most robust method under different scenarios and is a good approach to consider as an alternative to other time-to-event methods. When analysing real data is a method to be considered and recommended since outperforms other methods in proportional and non-proportional scenarios and sparse data, which is something usual in biomedical data and biomarkers analysis.

中文翻译:

使用局部不变性和权重增强SVM以获取生存数据。

背景技术在流行病学研究中以小样本规模分析中通量数据的必要性,特别是在研究生物医学数据时,可能会阻碍经典统计方法的使用。支持向量机(SVM)模型可以成功地在此设置中应用,因为它们是分析大量预测变量且样本量有限的数据的强大工具,尤其是在处理二进制结果时。但是,生物医学研究通常涉及事件发生时间的分析,并且必须考虑到审查制度。在SVM框架中处理审查数据的方法可以分为两类:基于支持向量回归(SVR)的方法和基于二进制分类的方法。基于SVR的方法在处理稀疏数据和产生可与Cox比例风险模型和内核Cox回归相媲美的结果方面似乎不是最佳方法。专门用于评估基于SVM进行二进制分类的方法的有限工作是基于使用特权信息的SVM学习和具有不确定类的SVM。结果本文提出了在二元分类框架内的替代方法和扩展,特别是一种用于加权审查观测值的条件生存方法和具有局部不变性的半监督SVM。通过仿真研究和一些真实的数据集,我们评估了这两种方法,并将它们与加权SVM模型,文献中发现的SVM扩展,内核Cox回归和Cox模型进行了比较。结论我们提出的方法在有关生物医学数据结构的各种现实情况下通常表现更好。具体而言,使用条件生存法的局部不变性方法是在不同情况下最鲁棒的方法,并且是替代其他事件时间方法的一种很好的方法。在分析真实数据时,应考虑并推荐使用此方法,因为在比例和非比例场景以及稀疏数据方面胜过其他方法,这在生物医学数据和生物标记分析中很常见。使用条件生存方法的局部不变性方法是在不同情况下最鲁棒的方法,并且是替代其他时间事件方法的一种很好的方法。在分析真实数据时,应考虑并推荐使用此方法,因为在比例和非比例场景以及稀疏数据方面胜过其他方法,这在生物医学数据和生物标记分析中很常见。使用条件生存方法的局部不变性方法是在不同情况下最鲁棒的方法,并且是替代其他时间事件方法的一种很好的方法。在分析真实数据时,应考虑并推荐使用此方法,因为在比例和非比例场景以及稀疏数据方面胜过其他方法,这在生物医学数据和生物标记分析中很常见。
更新日期:2020-05-19
down
wechat
bug