当前位置: X-MOL 学术Distrib. Parallel. Databases › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Sentimental analysis from imbalanced code-mixed data using machine learning approaches
Distributed and Parallel Databases ( IF 1.5 ) Pub Date : 2021-03-20 , DOI: 10.1007/s10619-021-07331-4
R Srinivasan 1 , C N Subalalitha 1
Affiliation  

Knowledge discovery from various perspectives has become a crucial asset in almost all fields. Sentimental analysis is a classification task used to classify the sentence based on the meaning of their context. This paper addresses class imbalance problem which is one of the important issues in sentimental analysis. Not much works focused on sentimental analysis with imbalanced class label distribution. The paper also focusses on another aspect of the problem which involves a concept called “Code Mixing”. Code mixed data consists of text alternating between two or more languages. Class imbalance distribution is a commonly noted phenomenon in a code-mixed data. The existing works have focused more on analyzing the sentiments in a monolingual data but not in a code-mixed data. This paper addresses all these issues and comes up with a solution to analyze sentiments for a class imbalanced code-mixed data using sampling technique combined with levenshtein distance metrics. Furthermore, this paper compares the performances of various machine learning approaches namely, Random Forest Classifier, Logistic Regression, XGBoost classifier, Support Vector Machine and Naïve Bayes Classifier using F1- Score.



中文翻译:

使用机器学习方法对不平衡代码混合数据进行情感分析

从各种角度发现知识已成为几乎所有领域的重要资产。情感分析是一种分类任务,用于根据上下文的含义对句子进行分类。本文解决了类别不平衡问题,这是情感分析中的重要问题之一。没有多少工作专注于类别标签分布不平衡的情感分析。本文还关注问题的另一个方面,涉及一个称为“代码混合”的概念。代码混合数据由在两种或多种语言之间交替的文本组成。类不平衡分布是代码混合数据中常见的现象。现有工作更多地侧重于分析单语数据中的情感,而不是代码混合数据中的情感。本文解决了所有这些问题,并提出了一种解决方案,使用采样技术结合 levenshtein 距离度量来分析类不平衡代码混合数据的情绪。此外,本文还比较了各种机器学习方法的性能,即随机森林分类器、逻辑回归、XGBoost 分类器、支持向量机和使用 F1-Score 的朴素贝叶斯分类器。

更新日期:2021-03-21
down
wechat
bug