当前位置: X-MOL 学术arXiv.cs.IR › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Cross-lingual Inductive Transfer to Detect Offensive Language
arXiv - CS - Information Retrieval Pub Date : 2020-07-07 , DOI: arxiv-2007.03771
Kartikey Pant and Tanvi Dadu

With the growing use of social media and its availability, many instances of the use of offensive language have been observed across multiple languages and domains. This phenomenon has given rise to the growing need to detect the offensive language used in social media cross-lingually. In OffensEval 2020, the organizers have released the \textit{multilingual Offensive Language Identification Dataset} (mOLID), which contains tweets in five different languages, to detect offensive language. In this work, we introduce a cross-lingual inductive approach to identify the offensive language in tweets using the contextual word embedding \textit{XLM-RoBERTa} (XLM-R). We show that our model performs competitively on all five languages, obtaining the fourth position in the English task with an F1-score of $0.919$ and eighth position in the Turkish task with an F1-score of $0.781$. Further experimentation proves that our model works competitively in a zero-shot learning environment, and is extensible to other languages.

中文翻译:

跨语言归纳迁移检测攻击性语言

随着社交媒体的使用及其可用性的增加,在多种语言和领域中观察到了许多使用攻击性语言的实例。这种现象引起了对跨语言检测社交媒体中使用的攻击性语言的日益增长的需求。在 OffensEval 2020 中,组织者发布了 \textit{multilingual Offensive Language Identification Dataset} (mOLID),其中包含五种不同语言的推文,以检测攻击性语言。在这项工作中,我们引入了一种跨语言归纳方法,使用上下文词嵌入 \textit{XLM-RoBERTa} (XLM-R) 来识别推文中的攻击性语言。我们展示了我们的模型在所有五种语言上的表现都具有竞争力,以 0 美元的 F1 分数在英语任务中获得第四名。919 美元,在土耳其任务中排名第八,F1 分数为 0.781 美元。进一步的实验证明,我们的模型在零样本学习环境中具有竞争力,并且可以扩展到其他语言。
更新日期:2020-07-09
down
wechat
bug