当前位置: X-MOL 学术arXiv.cs.CC › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Political Advertising Dataset: the use case of the Polish 2020 Presidential Elections
arXiv - CS - Computational Complexity Pub Date : 2020-06-17 , DOI: arxiv-2006.10207
{\L}ukasz Augustyniak, Krzysztof Rajda, Tomasz Kajdanowicz, Micha{\l} Bernaczyk

Political campaigns are full of political ads posted by candidates on social media. Political advertisements constitute a basic form of campaigning, subjected to various social requirements. We present the first publicly open dataset for detecting specific text chunks and categories of political advertising in the Polish language. It contains 1,705 human-annotated tweets tagged with nine categories, which constitute campaigning under Polish electoral law. We achieved a 0.65 inter-annotator agreement (Cohen's kappa score). An additional annotator resolved the mismatches between the first two annotators improving the consistency and complexity of the annotation process. We used the newly created dataset to train a well established neural tagger (achieving a 70% percent points F1 score). We also present a possible direction of use cases for such datasets and models with an initial analysis of the Polish 2020 Presidential Elections on Twitter.

中文翻译:

政治广告数据集:波兰 2020 年总统选举的用例

政治竞选活动充斥着候选人在社交媒体上发布的政治广告。政治广告构成了一种基本的竞选形式,受制于各种社会要求。我们展示了第一个公开开放的数据集,用于检测波兰语中的特定文本块和政治广告类别。它包含 1,705 条人工注释的推文,标记为九个类别,这些推文构成了波兰选举法下的竞选活动。我们实现了 0.65 的注释者间一致性(Cohen 的 kappa 分数)。额外的注释器解决了前两个注释器之间的不匹配问题,从而提高了注释过程的一致性和复杂性。我们使用新创建的数据集来训练一个完善的神经标记器(达到 70% 的 F1 分数)。
更新日期:2020-06-19
down
wechat
bug