当前位置: X-MOL 学术Complexity › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
USAD: An Intelligent System for Slang and Abusive Text Detection in PERSO-Arabic-Scripted Urdu
Complexity ( IF 1.7 ) Pub Date : 2020-11-30 , DOI: 10.1155/2020/6684995
Nauman Ul Haq 1 , Mohib Ullah 1 , Rafiullah Khan 1 , Arshad Ahmad 2 , Ahmad Almogren 3 , Bashir Hayat 4 , Bushra Shafi 5
Affiliation  

The use of slang, abusive, and offensive language has become common practice on social media. Even though social media companies have censorship polices for slang, abusive, vulgar, and offensive language, due to limited resources and research in the automatic detection of abusive language mechanisms other than English, this condemnable act is still practiced. This study proposes USAD (Urdu Slang and Abusive words Detection), a lexicon-based intelligent framework to detect abusive and slang words in Perso-Arabic-scripted Urdu Tweets. Furthermore, due to the nonavailability of the standard dataset, we also design and annotate a dataset of abusive, offensive, and slang word Perso-Arabic-scripted Urdu as our second significant contribution for future research. The results show that our proposed USAD model can identify 72.6% correctly as abusive or nonabusive Tweet. Additionally, we have also identified some key factors that can help the researchers improve their abusive language detection models.

中文翻译:

USAD:PERSO-阿拉伯脚本乌尔都语中的S语和辱骂文本检测智能系统

social语,侮辱性和攻击性语言的使用已成为社交媒体上的常见做法。尽管社交媒体公司对语,侮辱性,粗俗和令人反感的语言设有检查政策,但由于资源有限以及对自动检测除英语以外的其他辱骂性语言机制的研究,这种行为仍在实践中。这项研究提出了USAD(乌尔都语和A语单词检测),这是一种基于词典的智能框架,可以检测波斯语和阿拉伯语的乌尔都语推文中的辱骂和语单词。此外,由于标准数据集的不可用,我们还设计并注释了由波斯语,阿拉伯语和乌尔都语构成的侮辱性,攻击性和语词的数据集,这是我们对未来研究的第二个重要贡献。结果表明,我们提出的USAD模型可以识别72。正确6%为滥用或不滥用推文。此外,我们还确定了一些关键因素,可以帮助研究人员改善其辱骂性语言检测模型。
更新日期:2020-12-01
down
wechat
bug