当前位置: X-MOL 学术Multimed. Tools Appl. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Classification of multi-lingual tweets, into multi-class model using Naïve Bayes and semi-supervised learning
Multimedia Tools and Applications ( IF 3.0 ) Pub Date : 2020-08-29 , DOI: 10.1007/s11042-020-09512-2
Ayaz H. Khan , Muhammad Zubair

Twitter is a social media platform which has been proven to be a great tool for insights of emotions about products, policies etc. through a 280-character message called tweet, containing direct and unfiltered emotions by a large amount of user population. Twitter has attracted the attention of many researchers owing to the fact that every tweet is by default, public in nature which is not the case with Facebook. This paper proposes a model for multi-lingual (English and Roman Urdu) classification of tweets over diversely ranged classes (non-hierarchical architecture). Previous work in tweet classification is narrowly focused either on single language or either on uniform set of classes at most (Positive, Extremely Positive, Negative and Extremely Negative). The proposed model is based on semi-supervised learning and proposed feature selection approach makes it less dependent and highly adaptive for grabbing trending terms. This makes it a strong contender of choice for streaming data. In the methodology, using Naïve Bayes learning algorithm for each phase, obtained remarkable accuracy of up to 87.16% leading from both KNN and SVM models which are popular for NLP and Text classification domains.



中文翻译:

使用朴素贝叶斯和半监督学习将多语言推文分类为多类模型

Twitter是一个社交媒体平台,已被证明是通过280个字符的消息(称为tweet)来洞悉有关产品,政策等情绪的重要工具,其中包含大量用户的直接情绪和未经过滤的情绪。由于每个推文默认情况下都是公开的,因此Twitter吸引了许多研究人员的关注,而Facebook却并非如此。本文提出了一个模型,用于对多种类别(非分层体系结构)的推文进行多语言(英语和罗马乌尔都语)分类。以前的有关tweet分类的工作几乎只关注单一语言或最多只关注一组统一的类(正,极正,负和极负)。所提出的模型基于半监督学习,并且所提出的特征选择方法使其具有更少的依赖性,并且对于捕获趋势项具有高度的适应性。这使其成为流数据选择的有力竞争者。在该方法中,针对每个阶段使用朴素贝叶斯学习算法,从在NLP和文本分类领域中很流行的KNN和SVM模型中获得了高达87.16%的卓越准确性。

更新日期:2020-10-17
down
wechat
bug