A multi-layer approach to disinformation detection in US and Italian news spreading on Twitter,EPJ Data Science

当前位置： X-MOL 学术 › EPJ Data Sci. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

A multi-layer approach to disinformation detection in US and Italian news spreading on Twitter
EPJ Data Science ( IF 3.0 ) Pub Date : 2020-11-23 , DOI: 10.1140/epjds/s13688-020-00253-8
Francesco Pierri , Carlo Piccardi , Stefano Ceri

We tackle the problem of classifying news articles pertaining to disinformation vs mainstream news by solely inspecting their diffusion mechanisms on Twitter. This approach is inherently simple compared to existing text-based approaches, as it allows to by-pass the multiple levels of complexity which are found in news content (e.g. grammar, syntax, style). As we employ a multi-layer representation of Twitter diffusion networks where each layer describes one single type of interaction (tweet, retweet, mention, etc.), we quantify the advantage of separating the layers with respect to an aggregated approach and assess the impact of each layer on the classification. Experimental results with two large-scale datasets, corresponding to diffusion cascades of news shared respectively in the United States and Italy, show that a simple Logistic Regression model is able to classify disinformation vs mainstream networks with high accuracy (AUROC up to 94%). We also highlight differences in the sharing patterns of the two news domains which appear to be common in the two countries. We believe that our network-based approach provides useful insights which pave the way to the future development of a system to detect misleading and harmful information spreading on social media.

中文翻译：

在Twitter上传播美国和意大利新闻的虚假信息检测的多层方法

我们解决将与虚假信息和主流信息相关的新闻文章分类的问题通过仅在Twitter上检查其传播机制来了解新闻。与现有的基于文本的方法相比，该方法本质上很简单，因为它可以绕开新闻内容（例如语法，语法，样式）中发现的复杂性的多个级别。由于我们使用Twitter扩散网络的多层表示，其中每个层描述一种单独的交互类型（tweet，retweet，提及等），因此，我们相对于聚合方法量化了分离各层的优势，并评估了影响分类的每一层。使用两个大型数据集的实验结果，分别对应于在美国和意大利共享的新闻的扩散级联，证明简单的Logistic回归模型能够以较高的准确度对主流网络中的虚假信息进行分类（AUROC高达94％）。我们还着重介绍了两个新闻域共享模式的差异，这在两国似乎很常见。我们相信，我们基于网络的方法可提供有用的见解，从而为检测在社交媒体上传播的误导性和有害信息的系统的未来发展铺平道路。

更新日期：2020-11-25

点击分享查看原文

点击收藏

公开下载

阅读更多本刊最新论文本刊介绍/投稿指南11