当前位置: X-MOL 学术Appl. Soft Comput. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Feature selection considering Uncertainty Change Ratio of the class label
Applied Soft Computing ( IF 8.7 ) Pub Date : 2020-07-11 , DOI: 10.1016/j.asoc.2020.106537
Ping Zhang , Wanfu Gao

The topic of feature selection in high-dimensional data sets has attracted considerable attention. Feature selection can reduce the dimension of feature and improve the prediction accuracy of the classification model. Information-theoretical-based feature selection methods intend to obtain classification information regarding class labels from the already-selected feature subset as much as possible. Existing methods focus on the reduced uncertainty of class labels while ignoring the change of the remained uncertainty of class labels. In the process of feature selection, the large reduced uncertainty of class labels does not signify the few remained uncertainty of class labels when different candidate features are given. In this paper, we analyze the difference between the reduced uncertainty of class labels and the remained uncertainty of class labels and propose a new term named Uncertainty Change Ratio that considers the change of uncertainty of class labels. Finally, a novel method named Feature Selection considering Uncertainty Change Ratio (UCRFS) is proposed. To prove the classification superiority of the proposed method, UCRFS is compared to three traditional methods and four state-of-the-art methods on fourteen benchmark data sets. The experimental results demonstrate that UCRFS outperforms seven other methods in terms of average classification accuracy, AUC and F1 score.



中文翻译:

考虑类别标签不确定度变化率的特征选择

高维数据集中的特征选择主题引起了相当大的关注。特征选择可以减小特征的维数,提高分类模型的预测精度。基于信息论的特征选择方法旨在从已选择的特征子集中尽可能多地获得有关类标签的分类信息。现有方法集中于减小类别标签的不确定性,而忽略类别标签的剩余不确定性的变化。在特征选择的过程中,当给出不同的候选特征时,类别标签的不确定性大大降低并不表示类别标签的剩余不确定性很少。在本文中,我们分析了类别标签减少的不确定性与类别标签保持的不确定性之间的差异,并提出了一个新的术语不确定性变化率,它考虑了类别标签不确定性的变化。最后,提出了一种考虑不确定度变化比的特征选择方法。为了证明所提出方法的分类优势,在14个基准数据集上将UCRFS与三种传统方法和四种最新方法进行了比较。实验结果表明,在平均分类准确性,AUC和F1得分方面,UCRFS优于其他七个方法。提出了一种考虑不确定度变化率(UCRFS)的特征选择方法。为了证明该方法的分类优势,在14个基准数据集上将UCRFS与三种传统方法和四种最新方法进行了比较。实验结果表明,在平均分类准确性,AUC和F1得分方面,UCRFS优于其他七个方法。提出了一种考虑不确定度变化率(UCRFS)的特征选择方法。为了证明该方法的分类优势,在14个基准数据集上将UCRFS与三种传统方法和四种最新方法进行了比较。实验结果表明,在平均分类准确性,AUC和F1得分方面,UCRFS优于其他七个方法。

更新日期:2020-07-11
down
wechat
bug