当前位置: X-MOL 学术J. Supercomput. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
An integrated semi-automated framework for domain-based polarity words extraction from an unannotated non-English corpus
The Journal of Supercomputing ( IF 2.5 ) Pub Date : 2020-03-03 , DOI: 10.1007/s11227-020-03222-0
Mohammed Kaity , Vimala Balakrishnan

Building sentiment analysis resources is a fundamental step before developing any sentiment analysis model. Sentiment lexicons are one of these critical resources. However, many non-English languages suffer from a severe shortage of these resources and lexicons. This study proposes an integrated framework for extracting domain-based polarity words from unannotated massive non-English corpus. The framework consists of three layers, namely lexicon-based, corpus-based and human-based. The first two layers automatically recognize and extract new polarity words from a massive unannotated corpus using initial seed lexicons. A key advantage of the proposed framework is that it only needs an initial seed lexicon and unannotated corpus to start the extraction process. Therefore, the framework is semi-automated due to the use of seed lexicons. Experiments on three languages indicate the proposed framework outperformed existing lexicons, achieving F -scores of 77.8%, 83.8% and 68.6% for the Arabic, French and Malay lexicons, respectively.

中文翻译:

从未注释的非英语语料库中提取基于域的极性词的集成半自动框架

在开发任何情感分析模型之前,构建情感分析资源是一个基本步骤。情感词典是这些关键资源之一。然而,许多非英语语言严重缺乏这些资源和词典。本研究提出了一个集成框架,用于从未注释的海量非英语语料库中提取基于域的极性词。该框架由三层组成,即基于词典、基于语料库和基于人类。前两层使用初始种子词典从大量未注释的语料库中自动识别和提取新的极性词。所提出的框架的一个关键优势是它只需要一个初始的种子词典和未注释的语料库来开始提取过程。因此,由于使用了种子词典,该框架是半自动化的。
更新日期:2020-03-03
down
wechat
bug