当前位置: X-MOL 学术Communication Methods and Measures › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
What’s the Tone? Easy Doesn’t Do It: Analyzing Performance and Agreement Between Off-the-Shelf Sentiment Analysis Tools
Communication Methods and Measures ( IF 11.4 ) Pub Date : 2019-10-17 , DOI: 10.1080/19312458.2019.1671966
Mark Boukes 1 , Bob van de Velde 1 , Theo Araujo 1 , Rens Vliegenthart 1
Affiliation  

ABSTRACT This article scrutinizes the method of automated content analysis to measure the tone of news coverage. We compare a range of off-the-shelf sentiment analysis tools to manually coded economic news as well as examine the agreement between these dictionary approaches themselves. We assess the performance of five off-the-shelf sentiment analysis tools and two tailor-made dictionary-based approaches. The analyses result in five conclusions. First, there is little overlap between the off-the-shelf tools; causing wide divergence in terms of tone measurement. Second, there is no stronger overlap with manual coding for short texts (i.e., headlines) than for long texts (i.e., full articles). Third, an approach that combines individual dictionaries achieves a comparably good performance. Fourth, precision may increase to acceptable levels at higher levels of granularity. Fifth, performance of dictionary approaches depends more on the number of relevant keywords in the dictionary than on the number of valenced words as such; a small tailor-made lexicon was not inferior to large established dictionaries. Altogether, we conclude that off-the-shelf sentiment analysis tools are mostly unreliable and unsuitable for research purposes – at least in the context of Dutch economic news – and manual validation for the specific language, domain, and genre of the research project at hand is always warranted.

中文翻译:

什么是音调?轻松做不到:分析性能和现有情感分析工具之间的一致性

摘要本文详细介绍了自动内容分析方法以衡量新闻报道的基调。我们将一系列现成的情绪分析工具与手动编码的经济新闻进行了比较,并研究了这些字典方法之间的一致性。我们评估了五种现成的情绪分析工具和两种量身定制的基于字典的方法的性能。分析得出五个结论。首先,现成的工具之间几乎没有重叠;在音调测量方面引起很大的分歧。其次,对于短文本(即标题),与手动编码相比,对长文本(即全文)而言,重叠没有更大的重叠。第三,结合单个词典的方法可获得相当好的性能。第四,在更高的粒度级别上,精度可能会提高到可接受的水平。第五,字典方法的性能更多地取决于字典中相关关键词的数量,而不是像这样的有价单词的数量。量身定制的小型词典并不逊色于大型词典。总而言之,我们得出结论:现成的情绪分析工具大多不可靠且不适合用于研究目的(至少在荷兰经济新闻的情况下)以及针对手头研究项目的特定语言,领域和类型的手动验证始终保证。量身定制的小型词典并不逊色于大型词典。总而言之,我们得出结论:现成的情绪分析工具大多不可靠且不适合用于研究目的(至少在荷兰经济新闻的情况下)以及针对手头研究项目的特定语言,领域和类型的手动验证始终保证。量身定制的小型词典并不逊色于大型词典。总而言之,我们得出结论:现成的情绪分析工具大多不可靠且不适合用于研究目的(至少在荷兰经济新闻的情况下)以及针对手头研究项目的特定语言,领域和类型的手动验证始终保证。
更新日期:2019-10-17
down
wechat
bug