当前位置: X-MOL 学术Nat. Lang. Eng. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Sentiment analysis in Turkish: Supervised, semi-supervised, and unsupervised techniques
Natural Language Engineering ( IF 2.5 ) Pub Date : 2020-04-17 , DOI: 10.1017/s1351324920000200
Cem Rıfkı Aydın , Tunga Güngör

Although many studies on sentiment analysis have been carried out for widely spoken languages, this topic is still immature for Turkish. Most of the works in this language focus on supervised models, which necessitate comprehensive annotated corpora. There are a few unsupervised methods, and they utilize sentiment lexicons either built by translating from English lexicons or created based on corpora. This results in improper word polarities as the language and domain characteristics are ignored. In this paper, we develop unsupervised (domain-independent) and semi-supervised (domain-specific) methods for Turkish, which are based on a set of antonym word pairs as seeds. We make a comprehensive analysis of supervised methods under several feature weighting schemes. We then form ensemble of supervised classifiers and also combine the unsupervised and supervised methods. Since Turkish is an agglutinative language, we perform morphological analysis and use different word forms. The methods developed were tested on two datasets having different styles in Turkish and also on datasets in English to show the portability of the approaches across languages. We observed that the combination of the unsupervised and supervised approaches outperforms the other methods, and we obtained a significant improvement over the state-of-the-art results for both Turkish and English.

中文翻译:

土耳其语情绪分析:监督、半监督和无监督技术

尽管已经对广泛使用的语言进行了许多关于情感分析的研究,但这个话题对于土耳其语来说仍然不成熟。这种语言的大部分工作都集中在监督模型上,这需要全面的注释语料库。有一些无监督的方法,它们使用通过从英语词典翻译构建或基于语料库创建的情感词典。这会导致不正确的单词极性,因为语言和领域特征被忽略了。在本文中,我们为土耳其语开发了无监督(与领域无关)和半监督(特定领域)方法,它们基于一组反义词对作为种子。我们对几种特征加权方案下的监督方法进行了综合分析。然后,我们形成监督分类器的集合,并将无监督和监督方法结合起来。由于土耳其语是一种黏合语言,我们进行形态分析并使用不同的词形。所开发的方法在两个具有不同风格的土耳其语数据集和英语数据集上进行了测试,以显示这些方法的跨语言可移植性。我们观察到,无监督和监督方法的组合优于其他方法,并且我们在土耳其语和英语的最新结果中获得了显着改进。所开发的方法在两个具有不同风格的土耳其语数据集和英语数据集上进行了测试,以显示这些方法的跨语言可移植性。我们观察到,无监督和监督方法的组合优于其他方法,并且我们在土耳其语和英语的最新结果中获得了显着改进。所开发的方法在两个具有不同风格的土耳其语数据集和英语数据集上进行了测试,以显示这些方法的跨语言可移植性。我们观察到,无监督和监督方法的组合优于其他方法,并且我们在土耳其语和英语的最新结果中获得了显着改进。
更新日期:2020-04-17
down
wechat
bug