当前位置: X-MOL 学术Expert Syst. Appl. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Feature Selection for Classification using Principal Component Analysis and Information Gain
Expert Systems with Applications ( IF 7.5 ) Pub Date : 2021-02-26 , DOI: 10.1016/j.eswa.2021.114765
Erick Odhiambo Omuya , George Onyango Okeyo , Michael Waema Kimwele

Feature Selection and classification have previously been widely applied in various areas like business, medical and media fields. High dimensionality in datasets is one of the main challenges that has been experienced in classifying data, data mining and sentiment analysis. Irrelevant and redundant attributes have also had a negative impact on the complexity and operation of algorithms for classifying data. Consequently, the algorithms record poor results or performance. Some existing work use all attributes for classification, some of which are insignificant for the task, thereby leading to poor performance. This paper therefore develops a hybrid filter model for feature selection based on principal component analysis and information gain. The hybrid model is then applied to support classification using machine learning techniques e.g. the Naïve Bayes technique. Experimental results demonstrate that the hybrid filter model reduces data dimensions, selects appropriate feature sets, and reduces training time, hence providing better classification performance as measured by accuracy, precision and recall..



中文翻译:

基于主成分分析和信息增益的分类特征选择

特征选择和分类以前已广泛应用于商业,医疗和媒体领域等各个领域。数据集的高维性是数据分类,数据挖掘和情感分析中遇到的主要挑战之一。不相关和冗余的属性也对数据分类算法的复杂性和操作产生负面影响。因此,该算法记录了较差的结果或性能。现有的一些工作使用所有属性进行分类,其中一些对于任务而言微不足道,从而导致性能不佳。因此,本文基于主成分分析和信息增益,开发了一种用于特征选择的混合滤波器模型。然后使用机器学习技术将混合模型应用于支持分类 朴素贝叶斯技术。实验结果表明,混合滤波器模型减少了数据尺寸,选择了合适的特征集,并减少了训练时间,从而提供了更好的分类性能(通过准确性,精确度和召回率衡量)。

更新日期:2021-03-09
down
wechat
bug