An ensemble machine learning approach through effective feature extraction to classify fake news,Future Generation Computer Systems

当前位置： X-MOL 学术 › Future Gener. Comput. Syst. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

An ensemble machine learning approach through effective feature extraction to classify fake news
Future Generation Computer Systems ( IF 6.2 ) Pub Date : 2020-11-24 , DOI: 10.1016/j.future.2020.11.022
Saqib Hakak , Mamoun Alazab , Suleman Khan , Thippa Reddy Gadekallu , Praveen Kumar Reddy Maddikunta , Wazir Zada Khan

There are numerous channels available such as social media, blogs, websites, etc., through which people can easily access the news. It is due to the availability of these platforms that the dissemination of fake news has become easier. Anyone using these platforms can create and share fake news content based on personal or professional motives. To address the issue of detecting fake news, numerous studies based on supervised and unsupervised learning methods have been proposed. However, all those studies do suffer from a certain limitation of poor accuracy. The reason for poor accuracy can be attributed due to several reasons such as the poor selection of features, inefficient tuning of parameters, imbalanced datasetsred, etc. In this article, we have proposed an ensemble classification model for detection of the fake news that has achieved a better accuracy compared to the state-of-the-art. The proposed model extracts important features from the fake news datasets, and the extracted features are then classified using the ensemble model comprising of three popular machine learning models namely, Decision Tree, Random Forest and Extra Tree Classifier. We achieved a training and testing accuracy of 99.8% and 44.15% respectively on the ISOT dataset. For the Liar dataset, we achieved the training and testing accuracy of 100%.

中文翻译：

一种通过有效特征提取对假新闻进行分类的集成机器学习方法

有许多可用的渠道，例如社交媒体，博客，网站等，人们可以通过这些渠道轻松访问新闻。由于这些平台的可用性，假新闻的传播变得更加容易。使用这些平台的任何人都可以基于个人或专业动机创建和共享虚假新闻内容。为了解决检测假新闻的问题，已经提出了许多基于有监督和无监督学习方法的研究。然而，所有这些研究的确受到准确性差的一定限制。准确性不佳的原因可归因于多种原因，例如特征选择不当，参数调整效率低，数据集不平衡等。在本文中，我们提出了一种用于检测假新闻的整体分类模型，该模型与最新技术相比具有更高的准确性。提出的模型从假新闻数据集中提取重要特征，然后使用包含决策树，随机森林和额外树分类器三种流行的机器学习模型的集合模型对提取的特征进行分类。在ISOT数据集上，我们分别达到了99.8％和44.15％的训练和测试准确性。对于骗子数据集，我们达到了100％的训练和测试精度。随机森林和额外树分类器。在ISOT数据集上，我们分别达到了99.8％和44.15％的训练和测试准确性。对于骗子数据集，我们达到了100％的训练和测试精度。随机森林和额外树分类器。在ISOT数据集上，我们分别达到了99.8％和44.15％的训练和测试准确性。对于骗子数据集，我们达到了100％的训练和测试精度。

更新日期：2020-11-27

点击分享查看原文

点击收藏

阅读更多本刊最新论文