当前位置: X-MOL 学术Artif. Intell. Rev. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Feature selection methods for text classification: a systematic literature review
Artificial Intelligence Review ( IF 10.7 ) Pub Date : 2021-02-24 , DOI: 10.1007/s10462-021-09970-6
Julliano Trindade Pintas , Leandro A. F. Fernandes , Ana Cristina Bicharra Garcia

Feature Selection (FS) methods alleviate key problems in classification procedures as they are used to improve classification accuracy, reduce data dimensionality, and remove irrelevant data. FS methods have received a great deal of attention from the text classification community. However, only a few literature surveys include them focusing on text classification, and the ones available are either a superficial analysis or present a very small set of work in the subject. For this reason, we conducted a Systematic Literature Review (SLR) that asses 1376 unique papers from journals and conferences published in the past eight years (2013–2020). After abstract screening and full-text eligibility analysis, 175 studies were included in our SLR. Our contribution is twofold. We have considered several aspects of each proposed method and mapped them into a new categorization schema. Additionally, we mapped the main characteristics of the experiments, identifying which datasets, languages, machine learning algorithms, and validation methods have been used to evaluate new and existing techniques. By following the SLR protocol, we allow the replication of our revision process and minimize the chances of bias while classifying the included studies. By mapping issues and experiment settings, our SLR helps researchers to develop and position new studies with respect to the existing literature.



中文翻译:

文本分类的特征选择方法:系统的文献综述

功能选择(FS)方法可减轻分类过程中的关键问题,因为它们可用于提高分类准确性,减少数据维数并删除不相关的数据。FS方法已受到文本分类社区的广泛关注。但是,只有很少的文献调查包括侧重于文本分类的文献调查,并且可用的只是表面分析或在该主题中仅展示很少的工作。因此,我们进行了系统文献综述(SLR),评估了过去八年(2013-2020年)出版的期刊和会议中的1376篇独特论文。经过抽象筛选和全文资格分析后,我们的SLR中包括175个研究。我们的贡献是双重的。我们已经考虑了每种提议方法的几个方面,并将它们映射到新的分类方案中。此外,我们绘制了实验的主要特征,确定了哪些数据集,语言,机器学习算法和验证方法已用于评估新技术和现有技术。通过遵循SLR协议,我们可以对修订过程进行复制,并在对纳入研究进行分类时最大程度地减少出现偏见的机会。通过绘制问题和实验设置,我们的SLR帮助研究人员根据现有文献开发和定位新研究。验证方法已用于评估新技术和现有技术。通过遵循SLR协议,我们可以对修订过程进行复制,并在对纳入研究进行分类时最大程度地减少出现偏见的机会。通过绘制问题和实验设置,我们的SLR帮助研究人员根据现有文献开发和定位新研究。验证方法已用于评估新技术和现有技术。通过遵循SLR协议,我们可以对修订过程进行复制,并在对纳入研究进行分类时最大程度地减少出现偏见的机会。通过绘制问题和实验设置,我们的SLR帮助研究人员根据现有文献开发和定位新研究。

更新日期:2021-02-24
down
wechat
bug