ClickbaitTR: Dataset for clickbait detection from Turkish news sites and social media with a comparative analysis via machine learning algorithms,Journal of Information Science

当前位置： X-MOL 学术 › J. Inf. Sci. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

ClickbaitTR: Dataset for clickbait detection from Turkish news sites and social media with a comparative analysis via machine learning algorithms
Journal of Information Science ( IF 2.4 ) Pub Date : 2021-04-12 , DOI: 10.1177/01655515211007746
Şura Genç ₁ , Elif Surer ₂

Affiliation

Clickbait is a strategy that aims to attract people’s attention and direct them to specific content. Clickbait titles, created by the information that is not included in the main content or using intriguing expressions with various text-related features, have become very popular, especially in social media. This study expands the Turkish clickbait dataset that we had constructed for clickbait detection in our proof-of-concept study, written in Turkish. We achieve a 48,060 sample size by adding 8859 tweets and release a publicly available dataset – ClickbaitTR – with its open-source data analysis library. We apply machine learning algorithms such as Artificial Neural Network (ANN), Logistic Regression, Random Forest, Long Short-Term Memory Network (LSTM), Bidirectional Long Short-Term Memory (BiLSTM) and Ensemble Classifier on 48,060 news headlines extracted from Twitter. The results show that the Logistic Regression algorithm has 85% accuracy; the Random Forest algorithm has a performance of 86% accuracy; the LSTM has 93% accuracy; the ANN has 93% accuracy; the Ensemble Classifier has 93% accuracy; and finally, the BiLSTM has 97% accuracy. A thorough discussion is provided for the psychological aspects of clickbait strategy focusing on curiosity and interest arousal. In addition to a successful clickbait detection performance and the detailed analysis of clickbait sentences in terms of language and psychological aspects, this study also contributes to clickbait detection studies with the largest clickbait dataset in Turkish.

中文翻译：

ClickbaitTR：用于土耳其新闻网站和社交媒体的点击诱饵检测的数据集，并通过机器学习算法进行比较分析

Clickbait是一种旨在吸引人们注意力并将其引导到特定内容的策略。由主要内容中未包含的信息创建的Clickbait标题，或使用具有各种与文本相关的功能的吸引人的表达式创建的Clickbait标题，已经非常流行，尤其是在社交媒体中。这项研究扩展了我们在以土耳其语编写的概念验证研究中为检测点击诱饵而构建的土耳其语点击诱饵数据集。通过添加8859条推文，我们获得了48,060个样本大小，并发布了带有开源数据分析库的公开数据集– ClickbaitTR。我们将机器学习算法（例如人工神经网络（ANN），逻辑回归，随机森林，长短期记忆网络（LSTM），双向长短期记忆（BiLSTM）和Ensemble分类器）应用于48，060从Twitter摘录的新闻头条。结果表明，Logistic回归算法的准确率达到85％。随机森林算法具有86％的准确度；LSTM的准确性为93％；人工神经网络的准确性为93％；Ensemble分类器的准确性为93％；最后，BiLSTM的准确性为97％。对于点击诱饵策略的心理方面提供了详尽的讨论，重点是好奇心和引起兴趣。除了成功的点击诱饵检测性能以及对点击诱饵句子的语言和心理方面的详细分析之外，本研究还为土耳其语中最大的点击诱饵数据集的点击诱饵检测研究做出了贡献。随机森林算法具有86％的准确度；LSTM的准确性为93％；人工神经网络的准确性为93％；Ensemble分类器的准确性为93％；最后，BiLSTM的准确性为97％。对于点击诱饵策略的心理方面提供了详尽的讨论，重点是好奇心和引起兴趣。除了成功的点击诱饵检测性能以及对点击诱饵句子的语言和心理方面的详细分析之外，本研究还为土耳其语中最大的点击诱饵数据集的点击诱饵检测研究做出了贡献。随机森林算法具有86％的准确度；LSTM的准确性为93％；人工神经网络的准确性为93％；Ensemble分类器的准确性为93％；最后，BiLSTM的准确性为97％。对于点击诱饵策略的心理方面提供了详尽的讨论，重点是好奇心和引起兴趣。除了成功的点击诱饵检测性能以及对点击诱饵句子的语言和心理方面的详细分析之外，本研究还为土耳其语中最大的点击诱饵数据集的点击诱饵检测研究做出了贡献。对于点击诱饵策略的心理方面提供了详尽的讨论，重点是好奇心和引起兴趣。除了成功的点击诱饵检测性能以及对点击诱饵句子的语言和心理方面的详细分析之外，本研究还为土耳其语中最大的点击诱饵数据集的点击诱饵检测研究做出了贡献。对于点击诱饵策略的心理方面提供了详尽的讨论，重点是好奇心和引起兴趣。除了成功的点击诱饵检测性能以及对点击诱饵句子的语言和心理方面的详细分析之外，本研究还为土耳其语中最大的点击诱饵数据集的点击诱饵检测研究做出了贡献。

更新日期：2021-04-13

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南

全部期刊列表>>