当前位置: X-MOL 学术Brief. Bioinform. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Benchmark of filter methods for feature selection in high-dimensional gene expression survival data
Briefings in Bioinformatics ( IF 6.8 ) Pub Date : 2021-09-08 , DOI: 10.1093/bib/bbab354
Andrea Bommert 1 , Thomas Welchowski 2 , Matthias Schmid 2 , Jörg Rahnenführer 1
Affiliation  

Feature selection is crucial for the analysis of high-dimensional data, but benchmark studies for data with a survival outcome are rare. We compare 14 filter methods for feature selection based on 11 high-dimensional gene expression survival data sets. The aim is to provide guidance on the choice of filter methods for other researchers and practitioners. We analyze the accuracy of predictive models that employ the features selected by the filter methods. Also, we consider the run time, the number of selected features for fitting models with high predictive accuracy as well as the feature selection stability. We conclude that the simple variance filter outperforms all other considered filter methods. This filter selects the features with the largest variance and does not take into account the survival outcome. Also, we identify the correlation-adjusted regression scores filter as a more elaborate alternative that allows fitting models with similar predictive accuracy. Additionally, we investigate the filter methods based on feature rankings, finding groups of similar filters.

中文翻译:

高维基因表达生存数据中特征选择过滤方法的基准

特征选择对于分析高维数据至关重要,但针对具有生存结果的数据的基准研究很少见。我们比较了基于 11 个高维基因表达生存数据集的 14 种过滤方法进行特征选择。目的是为其他研究人员和从业人员提供有关过滤方法选择的指导。我们分析了使用过滤方法选择的特征的预测模型的准确性。此外,我们还考虑了运行时间、用于拟合具有高预测精度的模型的所选特征的数量以及特征选择的稳定性。我们得出结论,简单方差过滤器优于所有其他考虑的过滤器方法。此过滤器选择具有最大方差的特征,并且不考虑生存结果。还,我们将相关调整回归分数过滤器确定为更精细的替代方案,它允许拟合具有相似预测精度的模型。此外,我们研究了基于特征排名的过滤方法,找到了相似的过滤器组。
更新日期:2021-09-08
down
wechat
bug