当前位置: X-MOL 学术J. Big Data › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
A novel method of constrained feature selection by the measurement of pairwise constraints uncertainty
Journal of Big Data ( IF 8.6 ) Pub Date : 2020-10-02 , DOI: 10.1186/s40537-020-00352-3
Mehrdad Rostami , Kamal Berahmand , Saman Forouzandeh

In the past decades, the rapid growth of computer and database technologies has led to the rapid growth of large-scale datasets. On the other hand, data mining applications with high dimensional datasets that require high speed and accuracy are rapidly increasing. Semi-supervised learning is a class of machine learning in which unlabeled data and labeled data are used simultaneously to improve feature selection. The goal of feature selection over partially labeled data (semi-supervised feature selection) is to choose a subset of available features with the lowest redundancy with each other and the highest relevancy to the target class, which is the same objective as the feature selection over entirely labeled data. This method actually used the classification to reduce ambiguity in the range of values. First, the similarity values of each pair are collected, and then these values are divided into intervals, and the average of each interval is determined. In the next step, for each interval, the number of pairs in this range is counted. Finally, by using the strength and similarity matrices, a new constraint feature selection ranking is proposed. The performance of the presented method was compared to the performance of the state-of-the-art, and well-known semi-supervised feature selection approaches on eight datasets. The results indicate that the proposed approach improves previous related approaches with respect to the accuracy of the constrained score. In particular, the numerical results showed that the presented approach improved the classification accuracy by about 3% and reduced the number of selected features by 1%. Consequently, it can be said that the proposed method has reduced the computational complexity of the machine learning algorithm despite increasing the classification accuracy.



中文翻译:

通过成对约束不确定性测量的一种新的约束特征选择方法

在过去的几十年中,计算机和数据库技术的迅速发展导致大规模数据集的迅速发展。另一方面,具有高速和高精度的高维数据集的数据挖掘应用正在迅速增长。半监督学习是一类机器学习,其中同时使用未标记的数据和已标记的数据来改善特征选择。通过部分标记的数据进行特征选择(半监督的特征选择)的目标是选择可用特征的子集,这些可用特征之间的冗余度最低,并且与目标类别的相关性最高,这与通过特征标记进行特征选择的目的相同。完全标记的数据。该方法实际上使用分类来减少值范围内的歧义。第一,收集每一对的相似度值,然后将这些值划分为间隔,并确定每个间隔的平均值。在下一步中,对于每个间隔,将计算该范围内的对数。最后,利用强度和相似度矩阵,提出了一种新的约束特征选择排序。将本文提出的方法的性能与最新技术以及针对八个数据集的众所周知的半监督特征选择方法的性能进行了比较。结果表明,所提出的方法在约束分数的准确性方面改进了先前的相关方法。尤其是,数值结果表明,所提出的方法将分类精度提高了约3%,并将所选特征的数量减少了1%。所以,

更新日期:2020-10-02
down
wechat
bug