Sentiment Classification in Bangla Textual Content: A Comparative Study,arXiv - CS - Computation and Language

当前位置： X-MOL 学术 › arXiv.cs.CL › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Sentiment Classification in Bangla Textual Content: A Comparative Study
arXiv - CS - Computation and Language Pub Date : 2020-11-19 , DOI: arxiv-2011.10106
Md. Arid Hasan, Jannatul Tajrin, Shammur Absar Chowdhury, Firoj Alam

Sentiment analysis has been widely used to understand our views on social and political agendas or user experiences over a product. It is one of the cores and well-researched areas in NLP. However, for low-resource languages, like Bangla, one of the prominent challenge is the lack of resources. Another important limitation, in the current literature for Bangla, is the absence of comparable results due to the lack of a well-defined train/test split. In this study, we explore several publicly available sentiment labeled datasets and designed classifiers using both classical and deep learning algorithms. In our study, the classical algorithms include SVM and Random Forest, and deep learning algorithms include CNN, FastText, and transformer-based models. We compare these models in terms of model performance and time-resource complexity. Our finding suggests transformer-based models, which have not been explored earlier for Bangla, outperform all other models. Furthermore, we created a weighted list of lexicon content based on the valence score per class. We then analyzed the content for high significance entries per class, in the datasets. For reproducibility, we make publicly available data splits and the ranked lexicon list. The presented results can be used for future studies as a benchmark.

中文翻译：

孟加拉语文本内容中的情感分类：比较研究

情感分析已广泛用于理解我们对社会和政治议程的观点或对产品的用户体验。它是自然语言处理的核心和研究领域之一。但是，对于像Bangla这样的资源匮乏的语言而言，最主要的挑战之一是缺乏资源。在孟加拉语的当前文献中，另一个重要的局限性是由于缺乏明确的训练/测试划分而缺乏可比较的结果。在这项研究中，我们使用经典和深度学习算法探索了几个可公开获得情感标签的数据集和设计的分类器。在我们的研究中，经典算法包括SVM和随机森林，而深度学习算法包括CNN，FastText和基于变压器的模型。我们从模型性能和时间资源复杂性方面比较这些模型。我们的发现表明，基于变压器的模型优于其他所有模型，而这些模型尚未在Bangla进行过探索。此外，我们根据每个类的效价得分创建了词典内容的加权列表。然后，我们分析了数据集中每个类的高重要性条目的内容。为了重现性，我们公开提供了数据拆分和排名的词典列表。提出的结果可以用作将来的研究基准。

更新日期：2020-11-23

点击分享查看原文

点击收藏

阅读更多本刊最新论文