当前位置: X-MOL 学术Cognit. Comput. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Optimizing Sentiment Classification for Arabic Opinion Texts
Cognitive Computation ( IF 4.3 ) Pub Date : 2021-01-03 , DOI: 10.1007/s12559-020-09771-z
Radwa M. K. Saeed , Sherine Rady , Tarek F. Gharib

Meanwhile, products and services reviews’ provide a guide for potential customers allowing them to reach real knowledge about such products/services while making decisions. Sentiment classification is the task of analyzing opinions expressed in textual reviews automatically. The efficiency of this task is influenced by the set of representative features extracted from the reviews. Nevertheless, the value of extracted features lies as well in those that highly contribute to the classification process. Here comes the role of dimensionality reduction to eliminate the noise and reduce the feature high space while preserving required accuracies. The Arabic language and its datasets have inherent challenges. Besides, most sentiment classification studies integrating dimensionality reduction have focused on English texts, with only few studies conducted for other languages including Arabic. Massive amounts of Arabic data have been generated due to the huge population of the Arab world, and despite that, the aforementioned technical gaps are still existing for such language. This paper proposes a supervised learning approach for Arabic reviews sentiment classification. This approach utilizes optimized compact features that depend on a well representative feature set coupled with feature reduction techniques, which manages to guarantee high accuracy and time/space savings simultaneously. The employed feature set includes a triple combination of N-gram features and positive/negative N-grams counts features obtained after considering negation handling. The proposed approach examines two different linear transformation methods; principal component analysis (PCA) as an unsupervised transformation method and latent Dirichlet allocation (LDA) as a supervised transformation method. A spam detection process is executed prior to the learning for the purpose of increasing the classifier robustness. The proposed approach has been experimented with five Arabic opinion text datasets, of different domains and varying sizes (1.6 up to 94 K reviews). Experiments have been conducted for two-class (positive/negative sentiments) and three-class (positive/negative/neutral sentiments) classification problems. Accuracy values have been recorded in the range of 95.5–99.8% for the two-class classification problem and 92–97.3% for the three-class classification problem. The LDA feature reduction outperformed PCA by an average of 4.34% and 3.52% in accuracy and F1 Score measures, respectively. The overall approach outperformed the existing related works in literature by far of 23% and 34% for accuracy and F1 Score, respectively. The experimental studies and the obtained results show the efficiency of the proposed solution, which employs optimized features that rely on integrating a feature reduction module, together with a well representative feature set based on negation handled triple combination of N-gram features and positive/negative N-grams counts features. The overall results demonstrate great improvement with 24% increase in accuracy, 93% savings in the feature space, and 97% decrease in the classification execution time.



中文翻译:

优化阿拉伯语意见文本的情感分类

同时,“产品和服务评论”为潜在客户提供了指南,使他们可以在决策时获得有关此类产品/服务的真实知识。情感分类是自动分析文本评论中表达的观点的任务。此任务的效率受从评论中提取的一组代表性特征的影响。尽管如此,提取特征的价值还在于那些对分类过程有很大贡献的特征。降维的作用是消除噪声并减少特征高空间,同时保留所需的精度。阿拉伯语言及其数据集具有固有的挑战。此外,大多数整合降维的情感分类研究都集中在英语文本上,仅对包括阿拉伯语在内的其他语言进行的研究很少。由于阿拉伯世界人口众多,已产生了大量的阿拉伯数据,尽管如此,上述语言仍存在上述技术空白。本文提出了一种监督学习的阿拉伯评论情感分类方法。这种方法利用了优化的紧凑特征,这些特征依赖于具有代表性的特征集以及特征缩减技术,从而可以确保高精度和同时节省的时间/空间。使用的功能集包括 本文提出了一种阿拉伯语评论情感分类的监督学习方法。这种方法利用了优化的紧凑特征,这些特征依赖于具有代表性的特征集以及特征缩减技术,从而可以确保高精度和同时节省的时间/空间。使用的功能集包括 本文提出了一种阿拉伯语评论情感分类的监督学习方法。这种方法利用了优化的紧凑功能,这些功能依赖于具有代表性的特征集以及特征缩减技术,从而可以保证高精度和同时节省时间/空间。使用的功能集包括N元语法特征和正负N-grams计算考虑否定处理后获得的特征。所提出的方法研究了两种不同的线性变换方法。主成分分析(PCA)作为无监督转换方法,而潜在狄利克雷分配(LDA)作为有监督转换方法。为了提高分类器的鲁棒性,在学习之前执行了垃圾邮件检测过程。所提议的方法已经用五个阿拉伯语意见文本数据集进行了实验,这些数据集的域和大小各不相同(1.6到94 K条评论)。已经针对两类(正/负情绪)和三类(正/负/中性情绪)分类问题进行了实验。记录的两类分类问题的准确度值在95.5–99.8%之间,而在92–97之间。三分类问题占3%。在准确性和F1评分方面,LDA的特征减少分别比PCA分别高出4.34%和3.52%。总体方法在准确性和F1得分方面分别优于现有文献中的23%和34%。实验研究和获得的结果表明了所提出解决方案的效率,该解决方案采用了依赖于集成特征约简模块的优化特征,以及基于否定处理的N-gram特征和正/负三重组合的良好代表性特征集N克很重要。总体结果显示出极大的改进,其准确性提高了24%,功能空间节省了93%,分类执行时间减少了97%。在准确性和F1评分方面,LDA的特征减少分别比PCA分别高出4.34%和3.52%。总体方法在准确性和F1得分方面分别优于现有文献中的23%和34%。实验研究和获得的结果表明了该解决方案的效率,该解决方案采用了依赖于集成特征约简模块的优化特征,以及基于否定处理的N元语法特征和正/负三重组合的具有代表性的特征集N克很重要。总体结果显示出极大的改进,其准确性提高了24%,功能空间节省了93%,分类执行时间减少了97%。在准确性和F1评分方面,LDA的特征减少分别比PCA分别高出4.34%和3.52%。总体方法在准确性和F1得分方面分别优于现有文献中的23%和34%。实验研究和获得的结果表明了该解决方案的效率,该解决方案采用了依赖于集成特征约简模块的优化特征,以及基于否定处理的N元语法特征和正/负三重组合的具有代表性的特征集N克很重要。总体结果显示出极大的改进,其准确性提高了24%,功能空间节省了93%,分类执行时间减少了97%。准确度和F1分数测量分别达到52%。总体方法在准确性和F1得分方面分别优于现有文献中的23%和34%。实验研究和获得的结果表明了该解决方案的效率,该解决方案采用了依赖于集成特征约简模块的优化特征,以及基于否定处理的N元语法特征和正/负三重组合的具有代表性的特征集N克很重要。总体结果显示出极大的改进,其准确性提高了24%,功能空间节省了93%,分类执行时间减少了97%。准确度和F1分数测量分别达到52%。总体方法在准确性和F1得分方面分别优于现有文献中的23%和34%。实验研究和获得的结果表明了该解决方案的效率,该解决方案采用了依赖于集成特征约简模块的优化特征,以及基于否定处理的N元语法特征和正/负三重组合的具有代表性的特征集N克很重要。总体结果显示出极大的改进,其准确性提高了24%,功能空间节省了93%,分类执行时间减少了97%。实验研究和获得的结果表明了该解决方案的效率,该解决方案采用了依赖于集成特征约简模块的优化特征,以及基于否定处理的N元语法特征和正/负三重组合的具有代表性的特征集N克很重要。总体结果显示出极大的改进,其准确性提高了24%,功能空间节省了93%,分类执行时间减少了97%。实验研究和获得的结果表明了该解决方案的效率,该解决方案采用了依赖于集成特征约简模块的优化特征,以及基于否定处理的N元语法特征和正/负三重组合的具有代表性的特征集N克很重要。总体结果显示出极大的改进,其准确性提高了24%,功能空间节省了93%,分类执行时间减少了97%。

更新日期:2021-01-03
down
wechat
bug