A Two-stage Text Feature Selection Algorithm for Improving Text Classification,ACM Transactions on Asian and Low-Resource Language Information Processing

当前位置： X-MOL 学术 › ACM Trans. Asian Low Resour. Lang. Inf. Process. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

A Two-stage Text Feature Selection Algorithm for Improving Text Classification
ACM Transactions on Asian and Low-Resource Language Information Processing ( IF 2 ) Pub Date : 2021-05-06 , DOI: 10.1145/3425781
Ashokkumar P, Siva Shankar G, Gautam Srivastava, Praveen Kumar Reddy Maddikunta, Thippa Reddy Gadekallu

As the number of digital text documents increases on a daily basis, the classification of text is becoming a challenging task. Each text document consists of a large number of words (or features) that drive down the efficiency of a classification algorithm. This article presents an optimized feature selection algorithm designed to reduce a large number of features to improve the accuracy of the text classification algorithm. The proposed algorithm uses noun-based filtering, a word ranking that enhances the performance of the text classification algorithm. Experiments are carried out on three benchmark datasets, and the results show that the proposed classification algorithm has achieved the maximum accuracy when compared to the existing algorithms. The proposed algorithm is compared to Term Frequency-Inverse Document Frequency, Balanced Accuracy Measure, GINI Index, Information Gain, and Chi-Square. The experimental results clearly show the strength of the proposed algorithm.

中文翻译：

一种改进文本分类的两阶段文本特征选择算法

随着数字文本文档的数量每天都在增加，文本的分类正成为一项具有挑战性的任务。每个文本文档都包含大量降低分类算法效率的单词（或特征）。本文提出了一种优化的特征选择算法，旨在减少大量特征以提高文本分类算法的准确性。所提出的算法使用基于名词的过滤，一种增强文本分类算法性能的词排序。在三个基准数据集上进行了实验，结果表明，与现有算法相比，所提出的分类算法达到了最大的准确性。将所提出的算法与词频-逆文档频率进行比较，平衡准确度度量、基尼指数、信息增益和卡方。实验结果清楚地表明了所提出算法的强度。

更新日期：2021-05-06

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南

全部期刊列表>>