当前位置: X-MOL 学术Expert Syst. Appl. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Detecting clickbaits using two-phase hybrid CNN-LSTM biterm model
Expert Systems with Applications ( IF 7.5 ) Pub Date : 2020-02-28 , DOI: 10.1016/j.eswa.2020.113350
Sawinder Kaur , Parteek Kumar , Ponnurangam Kumaraguru

Clickbait indicates the type of content with an intending goal to attract the attention of readers. It has grown to become a nuisance to social media users. The purpose of clickbait is to bring an appealing link in front of users. Clickbaits seen in the form of headlines influence people to get attracted and curious to read the inside content. The content seen in the form of text on clickbait posts is very short to identify its features as clickbait. In this paper, a novel approach (two-phase hybrid CNN-LSTM Biterm model) has been proposed for modeling short topic content. The hybrid CNN-LSTM model when implemented with pre-trained GloVe embedding yields the best results based on accuracy, recall, precision, and F1-score performance metrics. The proposed model achieves 91.24%, 95.64%, 95.87% precision values for Dataset 1, Dataset 2 and Dataset 3, respectively. Eight types of clickbait such as Reasoning, Number, Reaction, Revealing, Shocking/Unbelievable, Hypothesis/Guess, Questionable, Forward referencing are classified in this work using the Biterm Topic Model (BTM). It has been shown that the clickbaits such as Shocking/Unbelievable, Hypothesis/Guess and Reaction are the highest in numbers among rest of the clickbait headlines published online. Also, a ground dataset of non-textual (image-based) data using multiple social media platforms has been created in this paper. The textual information has been retrieved from the images with the help of OCR tool. A comparative study is performed to show the effectiveness of our proposed model which helps to identify the various categories of clickbait headlines that are spread on social media platforms.



中文翻译:

使用两阶段混合CNN-LSTM双向模型检测点击诱饵

Clickbait表示内容的类型,其目标旨在吸引读者的注意力。它已成长为对社交媒体用户的困扰。clickbait的目的是在用户面前带来吸引人的链接。以标题形式显示的Clickbaits会影响人们吸引和好奇地阅读内部内容。Clickbait帖子上以文本形式显示的内容很短,无法将其功能标识为clickbait。本文提出了一种新颖的方法(两阶段混合CNN-LSTM Biterm模型)来对短主题内容进行建模。带有预训练的GloVe的混合CNN-LSTM模型嵌入会根据准确性,召回率,准确性和F1分数性能指标产生最佳结果。该模型为数据集1,数据集2和数据集3分别实现了91.24%,95.64%和95.87%的精度值。这项工作使用Biterm主题模型(BTM)对八种点击诱饵进行了分类,例如推理,数字,反应,显示,令人震惊/令人难以置信,假设/猜测,可疑,正向引用。事实证明,点击诱饵例如令人震惊/令人难以置信,假设/猜测反应在其他在线发布的clickbait头条新闻中排名最高。此外,本文还创建了使用多个社交媒体平台的非文本(基于图像)数据的地面数据集。文本信息已通过OCR工具从图像中检索到。进行了一项比较研究,以显示我们提出的模型的有效性,该模型有助于识别在社交媒体平台上传播的各种Clickbait标题。

更新日期:2020-02-28
down
wechat
bug