Sentiment Analysis of Persian-English Code-mixed Texts,arXiv - CS - Computation and Language

当前位置： X-MOL 学术 › arXiv.cs.CL › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Sentiment Analysis of Persian-English Code-mixed Texts
arXiv - CS - Computation and Language Pub Date : 2021-02-25 , DOI: arxiv-2102.12700
Nazanin Sabri, Ali Edalat, Behnam Bahrak

The rapid production of data on the internet and the need to understand how users are feeling from a business and research perspective has prompted the creation of numerous automatic monolingual sentiment detection systems. More recently however, due to the unstructured nature of data on social media, we are observing more instances of multilingual and code-mixed texts. This development in content type has created a new demand for code-mixed sentiment analysis systems. In this study we collect, label and thus create a dataset of Persian-English code-mixed tweets. We then proceed to introduce a model which uses BERT pretrained embeddings as well as translation models to automatically learn the polarity scores of these Tweets. Our model outperforms the baseline models that use Na\"ive Bayes and Random Forest methods.

中文翻译：

波斯英语代码混合文本的情感分析

互联网上数据的快速生成以及从业务和研究的角度理解用户的感受的需求促使创建了许多自动的单语情感检测系统。但是，由于社交媒体上数据的非结构化性质，最近，我们正在观察更多的多语言和代码混合文本实例。内容类型的这种发展对混合代码情感分析系统提出了新的要求。在这项研究中，我们收集，标记并因此创建了波斯语-英语代码混合推文的数据集。然后，我们继续介绍使用BERT预训练嵌入的模型以及翻译模型，以自动了解这些推文的极性得分。我们的模型优于使用朴素贝叶斯和随机森林方法的基线模型。

更新日期：2021-02-26

点击分享查看原文

点击收藏

阅读更多本刊最新论文