当前位置: X-MOL 学术Complex Intell. Syst. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
IFND: a benchmark dataset for fake news detection
Complex & Intelligent Systems ( IF 5.8 ) Pub Date : 2021-10-16 , DOI: 10.1007/s40747-021-00552-1
Dilip Kumar Sharma 1 , Sonal Garg 1
Affiliation  

Spotting fake news is a critical problem nowadays. Social media are responsible for propagating fake news. Fake news propagated over digital platforms generates confusion as well as induce biased perspectives in people. Detection of misinformation over the digital platform is essential to mitigate its adverse impact. Many approaches have been implemented in recent years. Despite the productive work, fake news identification poses many challenges due to the lack of a comprehensive publicly available benchmark dataset. There is no large-scale dataset that consists of Indian news only. So, this paper presents IFND (Indian fake news dataset) dataset. The dataset consists of both text and images. The majority of the content in the dataset is about events from the year 2013 to the year 2021. Dataset content is scrapped using the Parsehub tool. To increase the size of the fake news in the dataset, an intelligent augmentation algorithm is used. An intelligent augmentation algorithm generates meaningful fake news statements. The latent Dirichlet allocation (LDA) technique is employed for topic modelling to assign the categories to news statements. Various machine learning and deep-learning classifiers are implemented on text and image modality to observe the proposed IFND dataset's performance. A multi-modal approach is also proposed, which considers both textual and visual features for fake news detection. The proposed IFND dataset achieved satisfactory results. This study affirms that the accessibility of such a huge dataset can actuate research in this laborious exploration issue and lead to better prediction models.



中文翻译:

IFND:假新闻检测的基准数据集

发现假新闻是当今的一个关键问题。社交媒体负责传播假新闻。通过数字平台传播的假新闻会造成混乱,并引发人们的偏见。检测数字平台上的错误信息对于减轻其不利影响至关重要。近年来已经实施了许多方法。尽管工作富有成效,但由于缺乏全面的公开基准数据集,假新闻识别带来了许多挑战。不存在仅包含印度新闻的大规模数据集。因此,本文提出了 IFND(印度假新闻数据集)数据集。数据集由文本和图像组成。数据集中的大部分内容是关于 2013 年至 2021 年的事件。数据集内容是使用 Parsehub 工具废弃的。为了增加数据集中假新闻的大小,使用了智能增强算法。智能增强算法会生成有意义的假新闻陈述。潜在狄利克雷分配(LDA)技术用于主题建模,以将类别分配给新闻陈述。在文本和图像模态上实现各种机器学习和深度学习分类器,以观察所提出的 IFND 数据集的性能。还提出了一种多模式方法,该方法考虑了假新闻检测的文本和视觉特征。所提出的 IFND 数据集取得了令人满意的结果。这项研究证实,如此庞大的数据集的可访问性可以推动这一艰巨的探索问题的研究,并产生更好的预测模型。

更新日期:2021-10-17
down
wechat
bug