当前位置: X-MOL 学术arXiv.cs.IR › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
A Unified System for Aggression Identification in English Code-Mixed and Uni-Lingual Texts
arXiv - CS - Information Retrieval Pub Date : 2020-01-15 , DOI: arxiv-2001.05493
Anant Khandelwal, Niraj Kumar

Wide usage of social media platforms has increased the risk of aggression, which results in mental stress and affects the lives of people negatively like psychological agony, fighting behavior, and disrespect to others. Majority of such conversations contains code-mixed languages[28]. Additionally, the way used to express thought or communication style also changes from one social media plat-form to another platform (e.g., communication styles are different in twitter and Facebook). These all have increased the complexity of the problem. To solve these problems, we have introduced a unified and robust multi-modal deep learning architecture which works for English code-mixed dataset and uni-lingual English dataset both.The devised system, uses psycho-linguistic features and very ba-sic linguistic features. Our multi-modal deep learning architecture contains, Deep Pyramid CNN, Pooled BiLSTM, and Disconnected RNN(with Glove and FastText embedding, both). Finally, the system takes the decision based on model averaging. We evaluated our system on English Code-Mixed TRAC 2018 dataset and uni-lingual English dataset obtained from Kaggle. Experimental results show that our proposed system outperforms all the previous approaches on English code-mixed dataset and uni-lingual English dataset.

中文翻译:

英文代码混合和单语文本中攻击性识别的统一系统

社交媒体平台的广泛使用增加了攻击的风险,这会导致精神压力,并对人们的生活产生负面影响,例如心理痛苦、打架行为和不尊重他人。大多数此类对话包含代码混合语言[28]。此外,用于表达思想或沟通方式的方式也从一个社交媒体平台改变到另一个平台(例如,Twitter 和 Facebook 中的沟通方式不同)。这些都增加了问题的复杂性。为了解决这些问题,我们引入了一种统一且强大的多模态深度学习架构,该架构适用于英语代码混合数据集和单语英语数据集。 设计的系统使用心理语言特征和非常基本的语言特征. 我们的多模态深度学习架构包含 Deep Pyramid CNN、Pooled BiLSTM 和 Disconnected RNN(带有 Glove 和 FastText 嵌入)。最后,系统根据模型平均做出决策。我们在英语代码混合 TRAC 2018 数据集和从 Kaggle 获得的单语英语数据集上评估了我们的系统。实验结果表明,我们提出的系统在英语代码混合数据集和单语英语数据集上优于所有以前的方法。
更新日期:2020-01-22
down
wechat
bug