Detecting Abusive Instagram Comments in Turkish Using Convolutional Neural Network and Machine Learning Methods,Expert Systems with Applications

当前位置： X-MOL 学术 › Expert Syst. Appl. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Detecting Abusive Instagram Comments in Turkish Using Convolutional Neural Network and Machine Learning Methods
Expert Systems with Applications ( IF 7.5 ) Pub Date : 2021-03-03 , DOI: 10.1016/j.eswa.2021.114802
Habibe Karayiğit , Çiğdem İnan Acı , Ali Akdağlı

Instagram is a free photo-sharing platform where each user has a profile and can upload photos for followers to view, like, and comment. Abusive comments on images can be humiliating and harmful to those who share photos. Developing a comment filter in languages other than English is difficult and time-consuming. This paper proposes a dataset called Abusive Turkish Comments (ATC) to detect abusive Instagram comments in Turkish. It is composed of a large number of Instagram comments posted to tabloid and sports accounts (i.e., 10,528 abusive and 19,826 not-abusive). It is the first public dataset dedicated to detecting abusive Turkish messages, as far as we know. The sentiment annotation has been done in sentence-level by assigning polarity to each comment. The performance of the abusive message detection models was evaluated using several performance metrics: Convolutional Neural Network (CNN), five well-known classifiers (i.e., Naive Bayes, Support Vector Machine, Decision Tree, Random Forest, and Logistic Regression), and two reweighted classifiers (i.e., Adaptive Boosting (AdaBoost), eXtreme Gradient Boosting (XGBoost)) were compared in terms of F1-score, precision, and recall. The results showed that the best performance (i.e., Micro-averaged F1-score: 0.974, Macro-averaged F1-score: 0.973, Kappa-value: 0.946) was yielded by the CNN model on the oversampled ATC dataset. The abusive message detection model proposed in this study can contribute to the development of Turkish comment filters on Instagram. Different model combinations are considered to select the best model that gives better recognition accuracy.

中文翻译：

使用卷积神经网络和机器学习方法检测土耳其语中的Instagram恶意评论

Instagram是一个免费的照片共享平台，每个用户都有个人资料，可以上传照片供关注者查看，点赞和评论。对图像的侮辱性评论可能会令人羞辱，并且对共享照片的人有害。用英语以外的其他语言开发评论过滤器既困难又费时。本文提出了一个名为“滥用土耳其评论（ATC）”的数据集，以检测土耳其语中的Instagram滥用评论。它由发布到小报和体育帐户的大量Instagram评论组成（即10,528滥用和19,826滥用）。据我们所知，这是第一个致力于检测土耳其语滥用信息的公共数据集。通过为每个注释分配极性，可以在句子级别完成情感注释。使用几种性能指标评估了滥用消息检测模型的性能：卷积神经网络（CNN），五个著名的分类器（即朴素贝叶斯，支持向量机，决策树，随机森林和Logistic回归），以及两个重新加权的分类器（即自适应增强（AdaBoost），极限梯度增强（XGBoost））在F1得分，精度和召回率方面进行了比较。结果表明，在过采样的ATC数据集上，CNN模型产生了最佳性能（即，微平均F1-分数：0.974，宏平均F1-分数：0.973，Kappa值：0.946）。这项研究中提出的虐待信息检测模型可以有助于在Instagram上开发土耳其语评论过滤器。

更新日期：2021-03-03

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南11