当前位置: X-MOL 学术Commun. Stat. Simul. Comput. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Nested cross-validation with ensemble feature selection and classification model for high-dimensional biological data
Communications in Statistics - Simulation and Computation ( IF 0.9 ) Pub Date : 2020-11-29 , DOI: 10.1080/03610918.2020.1850790
Yi Zhong 1 , Prabhakar Chalise 1 , Jianghua He 1
Affiliation  

Abstract

In recent years, application of feature selection methods in biological datasets has greatly increased. By using feature selection techniques, a subset of relevant informative features is obtained which results in more interpretable model improving the prediction accuracy. In addition, ensemble learning can further provide a more robust model by combining the results of multiple statistical learning models. We propose an algorithm that uses ensemble methods to select the features and build the classification model with selected features. Our proposed approach is a two-step and two-layer cross-validation method. The first step performs the feature selection in the inner loop of cross-validation, whereas the second step builds the classification model in the outer loop of cross-validation. The final classification model, obtained by using the proposed method, has a higher prediction accuracy than that using the standard cross-validation. The applications of the proposed method have been presented using both simulated and three real datasets.



中文翻译:

高维生物数据集成特征选择和分类模型的嵌套交叉验证

摘要

近年来,特征选择方法在生物数据集中的应用大大增加。通过使用特征选择技术,可以获得相关信息特征的子集,从而产生更具可解释性的模型,从而提高预测准确性。此外,集成学习可以通过组合多个统计学习模型的结果进一步提供更稳健的模型。我们提出了一种算法,该算法使用集成方法来选择特征并使用所选特征构建分类模型。我们提出的方法是两步两层交叉验证方法。第一步在交叉验证的内循环中执行特征选择,而第二步在交叉验证的外循环中构建分类模型。最终的分类模型,通过使用所提出的方法获得的预测精度高于使用标准交叉验证的预测精度。已使用模拟数据集和三个真实数据集介绍了所提出方法的应用。

更新日期:2020-11-29
down
wechat
bug