当前位置: X-MOL 学术Knowl. Based Syst. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Enhancement of a multi-dialectal sentiment analysis system by the detection of the implied sarcastic features
Knowledge-Based Systems ( IF 8.8 ) Pub Date : 2021-06-15 , DOI: 10.1016/j.knosys.2021.107232
Ibtissam Touahri , Azzeddine Mazroui

Sentiment analysis is an NLP task that gained the interest of many researchers in various languages and recently in the Arabic language. We have encountered several challenges when dealing with this task, including sarcasm detection. In this article, we aim to exploit sarcastic characteristics to improve the accuracy of the sentiment analysis system. Sarcasm is difficult to detect because it is implicit and characterized by the presence of positive words in a negative context. We have then extracted a variety of features to define context incongruity and the opposition between the objective and subjective sentences. Offensive language and hate speech correspond to expressions that hurt others. The detection of offensive language is based on identifying offensive terms that are strongly negative and helpful to detect negative expressions. Thus, we have manually and automatically constructed sentimental, offensive and sarcastic lexicons and collected others. In the same way, many corpora either ironic (sarcastic, offensive) or sentimental (positive, negative) were collected. As sarcasm is a major challenge for the sentiment analysis system, we have built a balanced system that contains positive and negative (sarcastic, offensive) tweets. Since the analyzed corpus is multidialectal, we have used a cross dialect lexicon that retains meaning when passing from one dialect to another. Besides the Arabic dialect common characteristics, the classification was enhanced by the detection of the specificities of some dialects that use negation clitics as well as negation words to negate a term. The experiments prove that the enhancement of a sentiment analysis system by sarcastic features improved the results by 8% to reach 84.17% of accuracy using a classical machine learning approach and 80.36% using a Deep learning approach. The classical machine learning approach is improved afterward based on the expansion of the BOW lexicon and the reduction of the characteristic vector to reach an accuracy of 89.24%. This method is multilingual because the built model can be language independent. Indeed, it is enough to have the corresponding resources to apply the system to other languages.



中文翻译:

通过检测隐含讽刺特征增强多方言情感分析系统

情感分析是一项 NLP 任务,引起了许多研究人员的兴趣,这些研究人员使用各种语言,最近对阿拉伯语也产生了兴趣。在处理这项任务时,我们遇到了几个挑战,包括讽刺检测。在本文中,我们旨在利用讽刺特征来提高情感分析系统的准确性。讽刺很难被发现,因为它是隐含的,其特征是在消极语境中出现积极的词。然后我们提取了各种特征来定义上下文不一致以及客观和主观句子之间的对立。攻击性语言和仇恨言论对应于伤害他人的表达。攻击性语言的检测基于识别具有强烈负面影响且有助于检测负面表达的攻击性术语。因此,我们手动和自动构建了感伤的、攻击性的和讽刺的词典并收集了其他词典。以同样的方式,收集了许多讽刺(讽刺、冒犯)或感伤(积极、消极)的语料库。由于讽刺是情感分析系统的主要挑战,因此我们构建了一个平衡系统,其中包含正面和负面(讽刺、攻击性)推文。由于分析的语料库是多方言的,我们使用了一个跨方言词典,它在从一种方言传递到另一种方言时保留了意义。除了阿拉伯方言的共同特征外,还通过检测一些使用否定附加词和否定词来否定术语的方言的特殊性来增强分类。实验证明,通过讽刺特征增强情感分析系统将结果提高了 8%,使用经典机器学习方法达到 84.17% 的准确率,使用深度学习方法达到 80.36%。经典机器学习方法随后在扩展 BOW 词典和减少特征向量的基础上进行改进,达到 89.24% 的准确率。这种方法是多语言的,因为构建的模型可以独立于语言。确实,有相应的资源将系统应用到其他语言就足够了。经典机器学习方法随后在扩展 BOW 词典和减少特征向量的基础上进行改进,达到 89.24% 的准确率。这种方法是多语言的,因为构建的模型可以独立于语言。确实,有相应的资源将系统应用到其他语言就足够了。经典机器学习方法随后在扩展 BOW 词典和减少特征向量的基础上进行改进,达到 89.24% 的准确率。这种方法是多语言的,因为构建的模型可以独立于语言。确实,有相应的资源将系统应用到其他语言就足够了。

更新日期:2021-06-20
down
wechat
bug