当前位置: X-MOL 学术ACM Comput. Surv. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Causality-based Feature Selection
ACM Computing Surveys ( IF 23.8 ) Pub Date : 2020-09-28 , DOI: 10.1145/3409382
Kui Yu 1 , Xianjie Guo 1 , Lin Liu 2 , Jiuyong Li 2 , Hao Wang 1 , Zhaolong Ling 1 , Xindong Wu 3
Affiliation  

Feature selection is a crucial preprocessing step in data analytics and machine learning. Classical feature selection algorithms select features based on the correlations between predictive features and the class variable and do not attempt to capture causal relationships between them. It has been shown that the knowledge about the causal relationships between features and the class variable has potential benefits for building interpretable and robust prediction models, since causal relationships imply the underlying mechanism of a system. Consequently, causality-based feature selection has gradually attracted greater attentions and many algorithms have been proposed. In this article, we present a comprehensive review of recent advances in causality-based feature selection. To facilitate the development of new algorithms in the research area and make it easy for the comparisons between new methods and existing ones, we develop the first open-source package, called CausalFS, which consists of most of the representative causality-based feature selection algorithms (available at https://github.com/kuiy/CausalFS). Using CausalFS, we conduct extensive experiments to compare the representative algorithms with both synthetic and real-world datasets. Finally, we discuss some challenging problems to be tackled in future research.

中文翻译:

基于因果关系的特征选择

特征选择是数据分析和机器学习中的关键预处理步骤。经典的特征选择算法基于预测特征和类变量之间的相关性来选择特征,而不是试图捕捉它们之间的因果关系。已经表明,关于特征和类变量之间因果关系的知识对于构建可解释和稳健的预测模型具有潜在的好处,因为因果关系暗示了系统的潜在机制。因此,基于因果关系的特征选择逐渐引起了人们的广泛关注,并提出了许多算法。在本文中,我们全面回顾了基于因果关系的特征选择的最新进展。为了促进研究领域中新算法的开发并便于新方法与现有方法之间的比较,我们开发了第一个开源包,称为 CausalFS,它由大多数代表性的基于因果关系的特征选择算法组成(可在 https://github.com/kuiy/CausalFS 获得)。使用 CausalFS,我们进行了广泛的实验,以将代表性算法与合成数据集和真实数据集进行比较。最后,我们讨论了未来研究中需要解决的一些具有挑战性的问题。我们进行了广泛的实验,将代表性算法与合成数据集和真实数据集进行比较。最后,我们讨论了未来研究中需要解决的一些具有挑战性的问题。我们进行了广泛的实验,将代表性算法与合成数据集和真实数据集进行比较。最后,我们讨论了未来研究中需要解决的一些具有挑战性的问题。
更新日期:2020-09-28
down
wechat
bug