当前位置: X-MOL 学术Appl. Intell. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Feature redundancy term variation for mutual information-based feature selection
Applied Intelligence ( IF 3.4 ) Pub Date : 2020-01-10 , DOI: 10.1007/s10489-019-01597-z
Wanfu Gao , Liang Hu , Ping Zhang

Feature selection plays a critical role in many applications that are relevant to machine learning, image processing and gene expression analysis. Traditional feature selection methods intend to maximize feature dependency while minimizing feature redundancy. In previous information-theoretical-based feature selection methods, feature redundancy term is measured by the mutual information between a candidate feature and each already-selected feature or the interaction information among a candidate feature, each already-selected feature and the class. However, the larger values of the traditional feature redundancy term do not indicate the worse a candidate feature because a candidate feature can obtain large redundant information, meanwhile offering large new classification information. To address this issue, we design a new feature redundancy term that considers the relevancy between a candidate feature and the class given each already-selected feature, and a novel feature selection method named min-redundancy and max-dependency (MRMD) is proposed. To verify the effectiveness of our method, MRMD is compared to eight competitive methods on an artificial example and fifteen real-world data sets respectively. The experimental results show that our method achieves the best classification performance with respect to multiple evaluation criteria.

中文翻译:

用于基于互信息的特征选择的特征冗余术语变化

特征选择在与机器学习,图像处理和基因表达分析相关的许多应用中起着至关重要的作用。传统的特征选择方法旨在最大化特征依赖性,同时最小化特征冗余。在先前的基于信息理论的特征选择方法中,特征冗余项是通过候选特征和每个已经选择的特征之间的互信息或候选特征,每个已经选择的特征和类别之间的交互信息来测量的。但是,传统特征冗余项的较大值并不表示候选特征较差,因为候选特征可以获取大量冗余信息,同时提供大量新的分类信息。为了解决这个问题,我们设计了一个新的特征冗余术语,考虑了候选特征与给定每个已选择特征的类​​之间的相关性,并提出了一种新的特征选择方法,称为最小冗余和最大相关性(MRMD)。为了验证我们方法的有效性,在一个人工示例和15个真实数据集上,将MRMD与8种竞争方法进行了比较。实验结果表明,相对于多个评估标准,我们的方法获得了最佳的分类性能。在一个人工示例和15个真实数据集上,将MRMD与8种竞争方法进行了比较。实验结果表明,相对于多个评估标准,我们的方法获得了最佳的分类性能。在一个人工实例和15个真实数据集上,将MRMD与8种竞争方法进行了比较。实验结果表明,相对于多个评估标准,我们的方法获得了最佳的分类性能。
更新日期:2020-01-11
down
wechat
bug