Feature selection using self-information and entropy-based uncertainty measure for fuzzy neighborhood rough set,Complex & Intelligent Systems

当前位置： X-MOL 学术 › Complex Intell. Syst. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Feature selection using self-information and entropy-based uncertainty measure for fuzzy neighborhood rough set
Complex & Intelligent Systems ( IF 5.8 ) Pub Date : 2021-04-23 , DOI: 10.1007/s40747-021-00356-3
Jiucheng Xu , Meng Yuan , Yuanyuan Ma

Feature selection based on the fuzzy neighborhood rough set model (FNRS) is highly popular in data mining. However, the dependent function of FNRS only considers the information present in the lower approximation of the decision while ignoring the information present in the upper approximation of the decision. This construction method may lead to the loss of some information. To solve this problem, this paper proposes a fuzzy neighborhood joint entropy model based on fuzzy neighborhood self-information measure (FNSIJE) and applies it to feature selection. First, to construct four uncertain fuzzy neighborhood self-information measures of decision variables, the concept of self-information is introduced into the upper and lower approximations of FNRS from the algebra view. The relationships between these measures and their properties are discussed in detail. It is found that the fourth measure, named tolerance fuzzy neighborhood self-information, has better classification performance. Second, an uncertainty measure based on the fuzzy neighborhood joint entropy has been proposed from the information view. Inspired by both algebra and information views, the FNSIJE is proposed. Third, the K–S test is used to delete features with weak distinguishing performance, which reduces the dimensionality of high-dimensional gene datasets, thereby reducing the complexity of high-dimensional gene datasets, and then, a forward feature selection algorithm is provided. Experimental results show that compared with related methods, the presented model can select less important features and have a higher classification accuracy.

中文翻译：

基于自信息和基于熵的不确定性度量的模糊邻域粗糙集特征选择

基于模糊邻域粗糙集模型（FNRS）的特征选择在数据挖掘中非常流行。但是，FNRS的从属函数仅考虑决策的较低近似中存在的信息，而忽略决策的较高近似中存在的信息。这种构造方法可能会导致某些信息丢失。为解决这一问题，本文提出了一种基于模糊邻域自信息测度（FNSIJE）的模糊邻域联合熵模型，并将其应用于特征选择。首先，为了构造四个不确定的决策变量模糊邻域自信息量度，将自信息的概念从代数的角度引入到FNRS的上下近似中。详细讨论了这些度量及其属性之间的关系。发现第四种方法，即容忍模糊邻域自信息，具有更好的分类性能。其次，从信息的观点出发，提出了一种基于模糊邻域联合熵的不确定性测度方法。受到代数和信息视图的启发，提出了FNSIJE。第三，使用KS检验删除区分性能较弱的特征，从而降低了高维基因数据集的维数，从而降低了高维基因数据集的复杂度，进而提供了一种正向特征选择算法。实验结果表明，与相关方法相比，该模型可以选择较不重要的特征，具有较高的分类精度。发现第四种方法，即容忍模糊邻域自信息，具有更好的分类性能。其次，从信息的观点出发，提出了一种基于模糊邻域联合熵的不确定性测度方法。受到代数和信息视图的启发，提出了FNSIJE。第三，使用KS检验删除具有较弱区分性的特征，从而降低了高维基因数据集的维数，从而降低了高维基因数据集的复杂度，进而提供了一种正向特征选择算法。实验结果表明，与相关方法相比，该模型可以选择较不重要的特征，具有较高的分类精度。发现第四种方法，即容忍模糊邻域自信息，具有更好的分类性能。其次，从信息的观点出发，提出了一种基于模糊邻域联合熵的不确定性测度方法。受到代数和信息视图的启发，提出了FNSIJE。第三，使用KS检验删除具有较弱识别性能的特征，从而降低了高维基因数据集的维数，从而降低了高维基因数据集的复杂度，进而提供了一种正向特征选择算法。实验结果表明，与相关方法相比，该模型可以选择较不重要的特征，具有较高的分类精度。具有更好的分类性能。其次，从信息的观点出发，提出了一种基于模糊邻域联合熵的不确定性测度方法。受到代数和信息视图的启发，提出了FNSIJE。第三，使用KS检验删除具有较弱区分性的特征，从而降低了高维基因数据集的维数，从而降低了高维基因数据集的复杂度，进而提供了一种正向特征选择算法。实验结果表明，与相关方法相比，该模型可以选择较不重要的特征，具有较高的分类精度。具有更好的分类性能。其次，从信息的观点出发，提出了一种基于模糊邻域联合熵的不确定性测度方法。受到代数和信息视图的启发，提出了FNSIJE。第三，使用KS检验删除区分性能较弱的特征，从而降低了高维基因数据集的维数，从而降低了高维基因数据集的复杂度，进而提供了一种正向特征选择算法。实验结果表明，与相关方法相比，该模型可以选择较不重要的特征，具有较高的分类精度。受到代数和信息视图的启发，提出了FNSIJE。第三，使用KS检验删除区分性能较弱的特征，从而降低了高维基因数据集的维数，从而降低了高维基因数据集的复杂度，进而提供了一种正向特征选择算法。实验结果表明，与相关方法相比，该模型可以选择较不重要的特征，具有较高的分类精度。受到代数和信息视图的启发，提出了FNSIJE。第三，使用KS检验删除区分性能较弱的特征，从而降低了高维基因数据集的维数，从而降低了高维基因数据集的复杂度，进而提供了一种正向特征选择算法。实验结果表明，与相关方法相比，该模型可以选择较不重要的特征，具有较高的分类精度。

更新日期：2021-04-24

点击分享查看原文

点击收藏

公开下载

阅读更多本刊最新论文本刊介绍/投稿指南

全部期刊列表>>