当前位置: X-MOL 学术Neurocomputing › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
A novel two-stage wrapper feature selection approach based on greedy search for text sentiment classification
Neurocomputing ( IF 6 ) Pub Date : 2024-04-25 , DOI: 10.1016/j.neucom.2024.127729
Ensar Arif Sağbaş

Sentiment analysis is a crucial step in obtaining subjective data from online text sources. Nevertheless, the substantial challenge of high dimensionality prevails within text classification. Addressing this, dimension reduction emerges as a valuable approach to enhance the efficacy of classification in the domain of machine learning. The discerning removal of redundant features not only expedites training processes but also bolsters the achievement of accurate classifications. It is worth noting that the effectiveness of distinct feature selection methodologies can be contingent upon the unique attributes inherent in diverse datasets. Within the purview of this investigation, a novel two-stage approach is introduced, characterized by a greedy search-based wrapper feature selection algorithm. The underpinning of this algorithm involves leveraging the outcomes yielded by filter-based feature selection techniques to establish a prioritized sequence for the scrutiny of features within the proposed framework. This strategic sequencing harnesses the cumulative insights from a series of filter-based methodologies, thereby facilitating the curation of feature subsets that underscore pivotal attributes. Nonetheless, it is acknowledged that the greedy selection approach primarily favors features with high-ranking scores, and thus, it may not adequately evaluate the potential of feature combinations that involve low-scoring elements. An extensive experimental inquiry was conducted across widely recognized sentiment analysis datasets to assess the performance of the introduced methodology. The computational findings eloquently demonstrate that the proposed algorithm attains an average accuracy of 96.88% for nine public sentiment datasets and 94.43% for the Humir datasets when coupled with the multinomial Naive Bayes classifier. Furthermore, the experimental outcomes conspicuously establish the superiority of the proposed technique in state-of-the-art studies across the same set of nine sentiment datasets and the Humir datasets.

中文翻译:


一种基于贪婪搜索的文本情感分类新型两阶段包装特征选择方法



情感分析是从在线文本源获取主观数据的关键步骤。然而,高维的重大挑战在文本分类中普遍存在。为了解决这个问题,降维成为提高机器学习领域分类效率的一种有价值的方法。明智地去除冗余特征不仅可以加快训练过程,还可以促进准确分类的实现。值得注意的是,不同特征选择方法的有效性可能取决于不同数据集固有的独特属性。在本研究的范围内,引入了一种新颖的两阶段方法,其特征是基于贪婪搜索的包装特征选择算法。该算法的基础涉及利用基于过滤器的特征选择技术产生的结果来建立优先顺序,以在所提出的框架内对特征进行审查。这种战略排序利用了一系列基于过滤器的方法的累积见解,从而促进了强调关键属性的特征子集的管理。尽管如此,众所周知,贪婪选择方法主要偏向具有高评分的特征,因此,它可能无法充分评估涉及低评分元素的特征组合的潜力。在广泛认可的情绪分析数据集上进行了广泛的实验调查,以评估所引入方法的性能。计算结果雄辩地表明,所提出的算法对于 9 个公众情绪数据集和 94 个公众情绪数据集的平均准确率达到 96.88%。与多项式朴素贝叶斯分类器结合使用时,Humir 数据集的效率为 43%。此外,实验结果明显证明了所提出的技术在同一组九个情感数据集和 Humir 数据集的最新研究中的优越性。
更新日期:2024-04-25
down
wechat
bug