Fast feature selection for interval-valued data through kernel density estimation entropy,International Journal of Machine Learning and Cybernetics

当前位置： X-MOL 学术 › Int. J. Mach. Learn. & Cyber. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Fast feature selection for interval-valued data through kernel density estimation entropy
International Journal of Machine Learning and Cybernetics ( IF 3.1 ) Pub Date : 2020-05-07 , DOI: 10.1007/s13042-020-01131-5
Jianhua Dai , Ye Liu , Jiaolong Chen , Xiaofeng Liu

Kernel density estimation, which is a non-parametric method about estimating probability density distribution of random variables, has been used in feature selection. However, existing feature selection methods based on kernel density estimation seldom consider interval-valued data. Actually, interval-valued data exist widely. In this paper, a feature selection method based on kernel density estimation for interval-valued data is proposed. Firstly, the kernel function in kernel density estimation is defined for interval-valued data. Secondly, the interval-valued kernel density estimation probability structure is constructed by the defined kernel function, including kernel density estimation conditional probability, kernel density estimation joint probability and kernel density estimation posterior probability. Thirdly, kernel density estimation entropies for interval-valued data are proposed by the constructed probability structure, including information entropy, conditional entropy and joint entropy of kernel density estimation. Fourthly, we propose a feature selection approach based on kernel density estimation entropy. Moreover, we improve the proposed feature selection algorithm and propose a fast feature selection algorithm based on kernel density estimation entropy. Finally, comparative experiments are conducted from three perspectives of computing time, intuitive identifiability and classification performance to show the feasibility and the effectiveness of the proposed method.

中文翻译：

通过核密度估计熵对区间值数据进行快速特征选择

内核密度估计是一种用于估计随机变量的概率密度分布的非参数方法，已用于特征选择。但是，现有的基于核密度估计的特征选择方法很少考虑间隔值数据。实际上，区间值数据广泛存在。提出了一种基于核密度估计的区间值数据特征选择方法。首先，为区间值数据定义了核密度估计中的核函数。其次，通过定义的核函数构造区间值核密度估计概率结构，包括核密度估计条件概率，核密度估计联合概率和核密度估计后验概率。第三，通过构造概率结构，提出了区间值数据的核密度估计熵，包括核密度估计的信息熵，条件熵和联合熵。第四，提出一种基于核密度估计熵的特征选择方法。此外，我们对提出的特征选择算法进行了改进，并提出了一种基于核密度估计熵的快速特征选择算法。最后，从计算时间，直观可识别性和分类性能三个角度进行了比较实验，证明了该方法的可行性和有效性。核密度估计的条件熵和联合熵。第四，提出一种基于核密度估计熵的特征选择方法。此外，我们对提出的特征选择算法进行了改进，并提出了一种基于核密度估计熵的快速特征选择算法。最后，从计算时间，直观可识别性和分类性能三个角度进行了比较实验，证明了该方法的可行性和有效性。核密度估计的条件熵和联合熵。第四，提出一种基于核密度估计熵的特征选择方法。此外，我们对提出的特征选择算法进行了改进，并提出了一种基于核密度估计熵的快速特征选择算法。最后，从计算时间，直观可识别性和分类性能三个角度进行了对比实验，证明了该方法的可行性和有效性。

更新日期：2020-05-07

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南11