Deep Learning Based Sentiment Analysis in a Code-Mixed English-Hindi and English-Bengali Social Media Corpus,International Journal on Artificial Intelligence Tools

当前位置： X-MOL 学术 › Int. J. Artif. Intell. Tools › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Deep Learning Based Sentiment Analysis in a Code-Mixed English-Hindi and English-Bengali Social Media Corpus
International Journal on Artificial Intelligence Tools ( IF 1.0 ) Pub Date : 2020-06-15 , DOI: 10.1142/s0218213020500141
Anupam Jamatia ₁ , Steve Durairaj Swamy ₁ , Björn Gambäck ₁ , Amitava Das ₂ , Swapan Debbarma ₁

Affiliation

Sentiment analysis is a circumstantial analysis of text, identifying the social sentiment to better understand the source material. The article addresses sentiment analysis of an English-Hindi and English-Bengali code-mixed textual corpus collected from social media. Code-mixing is an amalgamation of multiple languages, which previously mainly was associated with spoken language. However, social media users also deploy it to communicate in ways that tend to be somewhat casual. The coarse nature of social media text poses challenges for many language processing applications. Here, the focus is on the low predictive nature of traditional machine learners when compared to Deep Learning counterparts, including the contextual language representation model BERT (Bidirectional Encoder Representations from Transformers), on the task of extracting user sentiment from code-mixed texts. Three deep learners (a BiLSTM CNN, a Double BiLSTM and an Attention-based model) attained accuracy 20–60% greater than traditional approaches on code-mixed data, and were for comparison also tested on monolingual English data.

中文翻译：

代码混合的英语-印地语和英语-孟加拉语社交媒体语料库中基于深度学习的情感分析

情感分析是对文本的环境分析，识别社会情感以更好地理解源材料。本文讨论了从社交媒体收集的英语-印地语和英语-孟加拉语代码混合文本语料库的情感分析。代码混合是多种语言的融合，以前主要与口语有关。然而，社交媒体用户也将其用于以比较随意的方式进行交流。社交媒体文本的粗糙性质对许多语言处理应用程序提出了挑战。在这里，重点是与深度学习对应物相比，传统机器学习器的低预测性，包括上下文语言表示模型 BERT（来自 Transformers 的双向编码器表示），关于从代码混合文本中提取用户情绪的任务。三个深度学习器（BiLSTM CNN、Double BiLSTM 和基于注意力的模型）在代码混合数据上的准确性比传统方法高 20-60%，并且还在单语英语数据上进行了比较测试。

更新日期：2020-06-15

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南11