当前位置: X-MOL 学术arXiv.cs.SI › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Clickbait Detection using Multiple Categorization Techniques
arXiv - CS - Social and Information Networks Pub Date : 2020-03-29 , DOI: arxiv-2003.12961
Abinash Pujahari and Dilip Singh Sisodia

Clickbaits are online articles with deliberately designed misleading titles for luring more and more readers to open the intended web page. Clickbaits are used to tempted visitors to click on a particular link either to monetize the landing page or to spread the false news for sensationalization. The presence of clickbaits on any news aggregator portal may lead to unpleasant experience to readers. Automatic detection of clickbait headlines from news headlines has been a challenging issue for the machine learning community. A lot of methods have been proposed for preventing clickbait articles in recent past. However, the recent techniques available in detecting clickbaits are not much robust. This paper proposes a hybrid categorization technique for separating clickbait and non-clickbait articles by integrating different features, sentence structure, and clustering. During preliminary categorization, the headlines are separated using eleven features. After that, the headlines are recategorized using sentence formality, syntactic similarity measures. In the last phase, the headlines are again recategorized by applying clustering using word vector similarity based on t-Stochastic Neighbourhood Embedding (t-SNE) approach. After categorization of these headlines, machine learning models are applied to the data set to evaluate machine learning algorithms. The obtained experimental results indicate the proposed hybrid model is more robust, reliable and efficient than any individual categorization techniques for the real-world dataset we used.

中文翻译:

使用多种分类技术的点击诱饵检测

Clickbait 是带有故意设计的误导性标题的在线文章,以吸引越来越多的读者打开预期的网页。Clickbait 用于诱使访问者点击特定链接以通过登录页面获利或传播虚假新闻以引起轰动。任何新闻聚合门户网站上出现的点击诱饵都可能给读者带来不愉快的体验。从新闻标题中自动检测点击诱饵标题一直是机器学习社区的一个具​​有挑战性的问题。最近已经提出了很多方法来防止点击诱饵文章。然而,最近可用于检测点击诱饵的技术并不是很可靠。本文提出了一种混合分类技术,通过集成不同的特征来分离点击诱饵和非点击诱饵文章,句子结构和聚类。在初步分类期间,使用 11 个特征将标题分开。之后,使用句子形式、句法相似性度量对标题进行重新分类。在最后一个阶段,通过基于 t-Stochastic Neighborhood Embedding (t-SNE) 方法的词向量相似度应用聚类,再次对标题进行重新分类。在对这些标题进行分类后,将机器学习模型应用于数据集以评估机器学习算法。获得的实验结果表明,所提出的混合模型比我们使用的真实世界数据集的任何单独分类技术都更加稳健、可靠和高效。标题使用句子形式、句法相似性度量重新分类。在最后一个阶段,通过基于 t-Stochastic Neighborhood Embedding (t-SNE) 方法的词向量相似度应用聚类,再次对标题进行重新分类。在对这些标题进行分类后,将机器学习模型应用于数据集以评估机器学习算法。获得的实验结果表明,所提出的混合模型比我们使用的真实世界数据集的任何单独分类技术都更加稳健、可靠和高效。标题使用句子形式、句法相似性度量重新分类。在最后一个阶段,通过基于 t-Stochastic Neighborhood Embedding (t-SNE) 方法的词向量相似度应用聚类,再次对标题进行重新分类。在对这些标题进行分类后,将机器学习模型应用于数据集以评估机器学习算法。获得的实验结果表明,所提出的混合模型比我们使用的真实世界数据集的任何单独分类技术都更加稳健、可靠和高效。在对这些标题进行分类后,将机器学习模型应用于数据集以评估机器学习算法。获得的实验结果表明,所提出的混合模型比我们使用的真实世界数据集的任何单独分类技术都更加稳健、可靠和高效。在对这些标题进行分类后,将机器学习模型应用于数据集以评估机器学习算法。获得的实验结果表明,所提出的混合模型比我们使用的真实世界数据集的任何单独分类技术都更加稳健、可靠和高效。
更新日期:2020-03-31
down
wechat
bug