当前位置: X-MOL 学术Pattern Recogn. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Accelerating information entropy-based feature selection using rough set theory with classified nested equivalence classes
Pattern Recognition ( IF 7.5 ) Pub Date : 2020-11-01 , DOI: 10.1016/j.patcog.2020.107517
Jie Zhao , Jia-ming Liang , Zhen-ning Dong , De-yu Tang , Zhen Liu

Abstract Feature selection effectively reduces the dimensionality of data. For feature selection, rough set theory offers a systematic theoretical framework based on consistency measures, of which information entropy is one of the most important significance measures of attributes. However, an information-entropy-based significance measure is computationally expensive and requires repeated calculations. Although many accelerating strategies have been proposed thus far, there remains a bottleneck when using an information-entropy-based feature selection algorithm to handle large-scale datasets with high dimensions. In this study, we introduce a classified nested equivalence class (CNEC)-based approach to calculate the information-entropy-based significance for feature selection using rough set theory. The proposed method extracts knowledge of the reducts of a decision table to reduce the universe and construct CNECs. By exploring the properties of different types of CNECs, we can not only accelerate both outer and inner significance calculation by discarding useless CNECs but also effectively decrease the number of inner significance calculations by using one type of CNECs. The use of CNECs is shown to significantly enhance three representative entropy-based feature selection algorithms that use rough set theory. The feature subset selected by the CNEC-based algorithms is the same as that selected by algorithms using the original definition of information entropies. Experiments conducted using 31 datasets from multiple sources, such as the UCI repository and KDD Cup competition, including large-scale and high-dimensional datasets, confirm the efficiency and effectiveness of the proposed method.

中文翻译:

使用具有分类嵌套等价类的粗糙集理论加速基于信息熵的特征选择

摘要 特征选择有效地降低了数据的维数。对于特征选择,粗糙集理论提供了一个基于一致性度量的系统理论框架,其中信息熵是属性最重要的意义度量之一。然而,基于信息熵的重要性度量在计算上是昂贵的并且需要重复计算。尽管迄今为止已经提出了许多加速策略,但在使用基于信息熵的特征选择算法来处理高维大规模数据集时仍然存在瓶颈。在这项研究中,我们引入了一种基于分类嵌套等价类 (CNEC) 的方法,以使用粗糙集理论计算基于信息熵的特征选择显着性。所提出的方法提取决策表的约简知识以减少宇宙并构建 CNEC。通过探索不同类型CNEC的特性,我们不仅可以通过丢弃无用的CNEC来加速外部和内部重要性计算,而且可以通过使用一种类型的CNEC有效减少内部重要性计算的次数。CNEC 的使用被证明可以显着增强使用粗糙集理论的三种代表性的基于熵的特征选择算法。基于CNEC的算法选择的特征子集与使用原始信息熵定义的算法选择的特征子集相同。使用来自多个来源的 31 个数据集进行的实验,例如 UCI 存储库和 KDD 杯比赛,包括大规模和高维数据集,
更新日期:2020-11-01
down
wechat
bug