当前位置: X-MOL 学术Appl. Intell. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Feature selection based on term frequency deviation rate for text classification
Applied Intelligence ( IF 5.3 ) Pub Date : 2020-11-11 , DOI: 10.1007/s10489-020-01937-4
Hongfang Zhou , Yiming Ma , Xiang Li

Feature selection is a technique to select a subset of the most relevant features for modeling training. In this paper, a new concept of TDR is firstly proposed to improve the classification accuracy. Then, a TDR-based algorithm for text classification is advanced. Finally, the extensive experiments are made on seven datasets (K1a, K1b, WAP, R52, R8, 20NewGroups, and Cade12) for two classifiers of Naive Bayes and Support Vector Machine. The experimental results indicate that the new approach can improve the classification accuracy by an average percent of 7.9%.



中文翻译:

基于词频偏差率的文本分类特征选择

特征选择是一种选择最相关特征的子集进行建模训练的技术。本文首先提出了一种新的TDR概念,以提高分类的准确性。然后,提出了一种基于TDR的文本分类算法。最后,针对Naive Bayes和支持向量机的两个分类器,在七个数据集(K1a,K1b,WAP,R52,R8、20NewGroups和Cade12)上进行了广泛的实验。实验结果表明,该新方法可以将分类准确率平均提高7.9%。

更新日期:2020-11-12
down
wechat
bug