Building and evaluating resources for sentiment analysis in the Greek language.,Language Resources and Evaluation

当前位置： X-MOL 学术 › Lang. Resour. Eval. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Building and evaluating resources for sentiment analysis in the Greek language.
Language Resources and Evaluation ( IF 2.7 ) Pub Date : 2018-07-14 , DOI: 10.1007/s10579-018-9420-4
Adam Tsakalidis _{1,

2} , Symeon Papadopoulos ₃ , Rania Voskaki ₄ , Kyriaki Ioannidou ₅ , Christina Boididou ₃ , Alexandra I Cristea _{1,

6} , Maria Liakata _{1,

2} , Yiannis Kompatsiaris ₃

Affiliation

Sentiment lexicons and word embeddings constitute well-established sources of information for sentiment analysis in online social media. Although their effectiveness has been demonstrated in state-of-the-art sentiment analysis and related tasks in the English language, such publicly available resources are much less developed and evaluated for the Greek language. In this paper, we tackle the problems arising when analyzing text in such an under-resourced language. We present and make publicly available a rich set of such resources, ranging from a manually annotated lexicon, to semi-supervised word embedding vectors and annotated datasets for different tasks. Our experiments using different algorithms and parameters on our resources show promising results over standard baselines; on average, we achieve a 24.9% relative improvement in F-score on the cross-domain sentiment analysis task when training the same algorithms with our resources, compared to training them on more traditional feature sources, such as n-grams. Importantly, while our resources were built with the primary focus on the cross-domain sentiment analysis task, they also show promising results in related tasks, such as emotion analysis and sarcasm detection.

中文翻译：

建立和评估用于希腊语情感分析的资源。

情感词典和单词嵌入构成了在线社交媒体中用于情感分析的公认信息源。尽管它们的有效性已在最新的英语情感分析和相关任务中得到了证明，但对于希腊语来说，这种可公开获得的资源却很少得到开发和评估。在本文中，我们解决了使用这种资源不足的语言分析文本时出现的问题。我们展示并公开提供了丰富的此类资源集，从手动注释的词典到半监督的词嵌入向量以及用于不同任务的带注释的数据集。我们在资源上使用不同算法和参数的实验显示，在标准基准之上的结果很有希望；平均而言，我们达到24分。与使用更传统的特征源（例如n-gram）进行训练相比，使用我们的资源训练相同的算法时，跨域情感分析任务的F得分相对提高了9％。重要的是，虽然我们的资源主要建立在跨域情感分析任务上，但它们在相关任务（例如情感分析和嘲讽检测）中也显示出令人鼓舞的结果。

更新日期：2018-07-14

点击分享查看原文

点击收藏

公开下载

阅读更多本刊最新论文本刊介绍/投稿指南

全部期刊列表>>