当前位置: X-MOL 学术Inf. Process. Manag. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Deep sentiments in Roman Urdu text using Recurrent Convolutional Neural Network model
Information Processing & Management ( IF 7.4 ) Pub Date : 2020-03-06 , DOI: 10.1016/j.ipm.2020.102233
Zainab Mahmood , Iqra Safder , Rao Muhammad Adeel Nawab , Faisal Bukhari , Raheel Nawaz , Ahmed S. Alfakeeh , Naif Radi Aljohani , Saeed-Ul Hassan

Although over 64 million people worldwide speak Urdu language and are well aware of its Roman script, limited research and efforts have been made to carry out sentiment analysis and build language resources for the Roman Urdu language. This article proposes a deep learning model to mine the emotions and attitudes of people expressed in Roman Urdu - consisting of 10,021 sentences from 566 online threads belonging to the following genres: Sports; Software; Food & Recipes; Drama; and Politics. The objectives of this research are twofold: (1) to develop a human-annotated benchmark corpus for the under-resourced Roman Urdu language for the sentiment analysis; and (2) to evaluate sentiment analysis techniques using the Rule-based, N-gram, and Recurrent Convolutional Neural Network (RCNN) models. Using Corpus, annotated by three experts to be positive, negative, and neutral with 0.557 Cohen's Kappa score, we run two sets of tests, i.e., binary classification (positive and negative) and tertiary classification (positive, negative and neutral). Finally, the results of the RCNN model are analyzed by comparing it with the outcome of the Rule-based and N-gram models. We show that the RCNN model outperforms baseline models in terms of accuracy of 0.652 for binary classification and 0.572 for tertiary classification.



中文翻译:

递归卷积神经网络模型在罗马乌尔都语文本中的深刻情感

尽管全世界有超过六千四百万人说乌尔都语并且非常了解其罗马文字,但是在进行情感分析和建立乌尔都语语言资源方面所做的研究和努力有限。本文提出了一种深度学习模型,用于挖掘以罗马乌尔都语表达的人们的情感和态度-包含来自566个在线类别的10,021个句子,属于以下类型:软件; 食品和食谱;戏剧; 和政治。这项研究的目的是双重的:(1)为资源匮乏的罗马乌尔都语语言开发一种由人注释的基准语料库,用于情感分析;(2)使用基于规则的N-gram和递归卷积神经网络(RCNN)模型评估情感分析技术。使用语料库 由三位专家分别以0.557 Cohen的Kappa得分将其注释为正,负和中性,我们运行了两组测试,即二元分类(正和负)和三级分类(正,负和中性)。最后,通过将RCNN模型的结果与基于规则和N-gram模型的结果进行比较,来分析其结果。我们显示,就二进制分类而言,RCNN模型的精度为0.652,对于三次分类而言为0.572,其性能优于基线模型。

更新日期:2020-04-21
down
wechat
bug