Toward Integrated CNN-based Sentiment Analysis of Tweets for Scarce-resource Language—Hindi,ACM Transactions on Asian and Low-Resource Language Information Processing

当前位置： X-MOL 学术 › ACM Trans. Asian Low Resour. Lang. Inf. Process. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Toward Integrated CNN-based Sentiment Analysis of Tweets for Scarce-resource Language—Hindi
ACM Transactions on Asian and Low-Resource Language Information Processing ( IF 2 ) Pub Date : 2021-06-30 , DOI: 10.1145/3450447
Vedika Gupta ₁ , Nikita Jain ₁ , Shubham Shubham ₁ , Agam Madan ₁ , Ankit Chaudhary ₁ , Qin Xin ₂

Affiliation

Linguistic resources for commonly used languages such as English and Mandarin Chinese are available in abundance, hence the existing research in these languages. However, there are languages for which linguistic resources are scarcely available. One of these languages is the Hindi language. Hindi, being the fourth-most popular language, still lacks in richly populated linguistic resources, owing to the challenges involved in dealing with the Hindi language. This article first explores the machine learning-based approaches—Naïve Bayes, Support Vector Machine, Decision Tree, and Logistic Regression—to analyze the sentiment contained in Hindi language text derived from Twitter. Further, the article presents lexicon-based approaches (Hindi Senti-WordNet, NRC Emotion Lexicon) for sentiment analysis in Hindi while also proposing a Domain-specific Sentiment Dictionary. Finally, an integrated convolutional neural network (CNN)—Recurrent Neural Network and Long Short-term Memory—is proposed to analyze sentiment from Hindi language tweets, a total of 23,767 tweets classified into positive, negative, and neutral. The proposed CNN approach gives an accuracy of 85%.

中文翻译：

面向资源稀缺语言推文的基于 CNN 的集成情感分析——印地语

英语和普通话等常用语言的语言资源丰富，因此已有对这些语言的研究。然而，有些语言的语言资源很少可用。其中一种语言是印地语。印地语作为第四大流行语言，由于处理印地语所涉及的挑战，仍然缺乏丰富的语言资源。本文首先探讨了基于机器学习的方法——朴素贝叶斯、支持向量机、决策树和逻辑回归——来分析源自 Twitter 的印地语文本中包含的情绪。此外，本文介绍了基于词典的方法（Hindi Senti-WordNet，NRC Emotion Lexicon）在印地语中进行情感分析，同时还提出了特定领域的情感词典。最后，提出了一个集成卷积神经网络（CNN）——循环神经网络和长短期记忆——来分析印地语推文的情绪，共有 23,767 条推文分为正面、负面和中性。所提出的 CNN 方法给出了 85% 的准确率。

更新日期：2021-06-30

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南

全部期刊列表>>