当前位置: X-MOL 学术Multimedia Syst. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Abusive language detection from social media comments using conventional machine learning and deep learning approaches
Multimedia Systems ( IF 3.9 ) Pub Date : 2021-04-01 , DOI: 10.1007/s00530-021-00784-8
Muhammad Pervez Akhter , Zheng Jiangbin , Irfan Raza Naqvi , Mohammed AbdelMajeed , Tehseen Zia

With the increase in the culture of social media and netizen, every day, millions of comments are posted on the uploaded posts. The use of abusive language in user comments has been increased rapidly. Abusive language in online comments initiates cyber-bullying that targets individuals (celebrity, politician, and product) and a group of people (specific country, age, and religion). It is important to detect and analyze abusive language from online comments automatically. There have been several attempts in the literature to detect abusive language for English. In this study, we perform abusive language detection from Urdu and Roman Urdu comments using five diverse ML models (NB, SVM, IBK, Logistic, and JRip) and four DL models (CNN, LSTM, BLSTM, and CLSTM). We apply these models on a large dataset with ten thousands of Roman Urdu comments and a small dataset with more than two thousand comments of Urdu. Natural language constructs, English-like nature of Roman Urdu script, and Nastaleeq style of Urdu make it more challenging to process and classify the comments of both scripts using deep learning and machine learning approaches. From experiments, we find that the convolutional neural network outperforms the other models and achieves 96.2% and 91.4% accuracy on Urdu and Roman Urdu. Our results also reveal that the one-layer architectures of deep learning models give better results than two-layer architectures. Further, we compare the performance of deep learning models with five conventional machine learning models and conclude that deep learning models perform significantly better than machine learning models.



中文翻译:

使用常规机器学习和深度学习方法从社交媒体评论中滥用语言检测

随着社交媒体和网民文化的增长,每天上传的帖子中都会发布数百万条评论。用户评论中滥用语言的使用已迅速增加。在线评论中的侮辱性语言会引发针对个人(名人,政客和产品)和一群人(特定国家,年龄和宗教信仰)的网络欺凌行为。自动检测和分析在线评论中的辱骂性语言非常重要。文献中已经进行了几次尝试来检测英语的辱骂性语言。在这项研究中,我们使用五个不同的ML模型(NB,SVM,IBK,Logistic和JRip)和四个DL模型(CNN,LSTM,BLSTM和CLSTM)从Urdu和Roman Urdu注释中进行滥用语言检测。我们将这些模型应用于具有一万个罗马乌尔都语注释的大型数据集和具有两千多个乌尔都语注释的小型数据集。自然语言结构,罗马乌尔都语脚本的英语般性质以及乌尔都语的Nastaleeq风格使使用深度学习和机器学习方法对两个脚本的注释进行处理和分类更具挑战性。通过实验,我们发现卷积神经网络的性能优于其他模型,并且在Urdu和Roman Urdu上达到了96.2%和91.4%的准确性。我们的结果还表明,深度学习模型的单层架构比两层架构提供更好的结果。进一步,

更新日期:2021-04-02
down
wechat
bug