当前位置: X-MOL 学术Comput. Math. Organ. Theory › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Disinformation: analysis and identification
Computational and Mathematical Organization Theory ( IF 1.8 ) Pub Date : 2021-06-18 , DOI: 10.1007/s10588-021-09336-x
Archita Pathak 1, 2 , Rohini K Srihari 1 , Nihit Natu 1, 3
Affiliation  

We present an extensive study on disinformation, which is defined as information that is false and misleading and intentionally shared to cause harm. Through this work, we aim to answer the following questions:

  • Can we automatically and accurately classify a news article as containing disinformation?

  • What characteristics of disinformation differentiate it from other types of benign information?

We conduct this study in the context of two significant events: the US elections of 2016 and the 2020 COVID pandemic. We build a series of classifiers to (i) examine linguistic clues exhibited by different types of fake news articles, (ii) analyze “clickbaityness” of disinformation headlines, and (iii) finally, perform fine-grained, veracity-based article classification through a natural language inference (NLI) module for automated disinformation verification; this utilizes a manually curated set of evidence sources. For the latter, we built a new dataset that is annotated with generic, veracity-based labels and ground truth evidence supporting each label. The veracity labels were formulated based on examining standards used by reputable fact-checking organizations. We show that disinformation derives features from both propaganda and mainstream news, making it more challenging to detect. However, there is significant potential for automating the fact-checking process to incorporate the degree of veracity. We provide error analysis that illustrates the challenges involved in the automated fact-checking task and identifies factors that may improve this process in future work. Finally, we also describe the implementation of a web app that extracts important entities and actions from a given article and searches the web to gather evidence from credible sources. The evidence articles are then used to generate a veracity label that can assist manual fact-checkers engaged in combating disinformation.



中文翻译:

虚假信息:分析和识别

我们对虚假信息进行了广泛的研究,虚假信息被定义为虚假和误导性的信息,并且故意分享以造成伤害。通过这项工作,我们旨在回答以下问题:

  • 我们能否自动准确地将新闻文章归类为包含虚假信息?

  • 虚假信息有哪些特征与其他类型的良性信息不同?

我们在两个重大事件的背景下进行这项研究:2016 年美国大选和 2020 年新冠疫情。我们构建了一系列分类器,以(i)检查不同类型假新闻文章所表现出的语言线索,(ii)分析虚假信息标题的“标题党性”,以及(iii)最后,通过用于自动验证虚假信息的自然语言推理(NLI)模块;这利用了一组手动管理的证据来源。对于后者,我们构建了一个新的数据集,该数据集使用通用的、基于准确性的标签和支持每个标签的地面事实证据进行注释。真实性标签是根据信誉良好的事实核查组织使用的检查标准制定的。我们表明,虚假信息具有来自宣传和主流新闻的特征,这使得检测起来更加困难。然而,自动化事实核查过程以纳入真实性程度具有巨大的潜力。我们提供错误分析,说明自动事实检查任务中涉及的挑战,并确定可能在未来工作中改进此流程的因素。最后,我们还描述了一个网络应用程序的实现,该应用程序从给定的文章中提取重要的实体和操作,并搜索网络以从可靠的来源收集证据。然后,证据文章被用来生成真实性标签,可以帮助人工事实核查人员打击虚假信息。

更新日期:2021-06-18
down
wechat
bug