当前位置: X-MOL 学术Inform. Sci. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
NEC: A nested equivalence class-based dependency calculation approach for fast feature selection using rough set theory
Information Sciences Pub Date : 2020-05-26 , DOI: 10.1016/j.ins.2020.03.092
Jie Zhao , Jia-ming Liang , Zhen-ning Dong , De-yu Tang , Zhen Liu

Feature selection plays an important role in data mining and machine learning tasks. As one of the most effective methods for feature selection, rough set theory provides a systematic theoretical framework for consistency-based feature selection, in which positive region-based dependency calculation is the most important step. However, it is time-consuming, and although many improved algorithms have been proposed, they are still computationally time-consuming. Therefore, to overcome this shortcoming, in this study, a nested equivalence class (NEC) approach is introduced to calculate dependency. The proposed method starts from the finest partition of the universe, and then extracts and uses the known knowledge of reducts in a decision table to construct an NEC. The proposed method not only simplifies dependency calculation but also reduces the universe correspondingly, in most cases. Using the proposed NEC-based approach, a number of representative heuristic- and swarm intelligence-based feature selection algorithms that apply rough set theory were enhanced. Note that the feature subset selected by each modified algorithm and that selected by the original algorithm were the same. Experiments conducted using 33 datasets from the UCI repository and KDD Cup competition, which included large-scale and high-dimensional datasets, demonstrated the efficiency and effectiveness of the proposed method.



中文翻译:

NEC:使用粗糙集理论的基于嵌套等价类的依存关系计算方法,用于快速特征选择

特征选择在数据挖掘和机器学习任务中起着重要作用。作为最有效的特征选择方法之一,粗糙集理论为基于一致性的特征选择提供了系统的理论框架,其中基于正区域的依存关系计算是最重要的步骤。然而,这是耗时的,并且尽管已经提出了许多改进的算法,但是它们仍然在计算上耗时。因此,为了克服这一缺点,在本研究中,引入了嵌套等价类(NEC)方法来计算依赖性。所提出的方法从宇宙的最佳划分开始,然后在决策表中提取并使用已知的约简知识来构造NEC。在大多数情况下,所提出的方法不仅简化了依赖性计算,而且相应地减少了Universe。使用提出的基于NEC的方法,应用了粗糙集理论的许多具有代表性的启发式和群体智能特征选择算法得到了增强。注意,每个修改算法选择的特征子集和原始算法选择的特征子集是相同的。使用来自UCI资料库和KDD Cup竞赛的33个数据集进行的实验(包括大规模和高维数据集)证明了该方法的效率和有效性。注意,每个修改算法选择的特征子集和原始算法选择的特征子集是相同的。使用来自UCI资料库和KDD Cup竞赛的33个数据集进行的实验(包括大规模和高维数据集)证明了该方法的效率和有效性。注意,每个修改算法选择的特征子集和原始算法选择的特征子集是相同的。使用来自UCI资料库和KDD Cup竞赛的33个数据集进行的实验(包括大规模和高维数据集)证明了该方法的效率和有效性。

更新日期:2020-05-26
down
wechat
bug