Emotionally Informed Hate Speech Detection: A Multi-target Perspective

Chiril, Patricia; Pamungkas, Endang Wahyu; Benamara, Farah; Moriceau, Véronique; Patti, Viviana

doi:10.1007/s12559-021-09862-5

Emotionally Informed Hate Speech Detection: A Multi-target Perspective

Open access
Published: 28 June 2021

Volume 14, pages 322–352, (2022)
Cite this article

Download PDF

You have full access to this open access article

Cognitive Computation Aims and scope Submit manuscript

Emotionally Informed Hate Speech Detection: A Multi-target Perspective

Download PDF

Patricia Chiril¹,
Endang Wahyu Pamungkas²,
Farah Benamara¹,
Véronique Moriceau¹ &
…
Viviana Patti²

8475 Accesses
26 Citations
2 Altmetric
Explore all metrics

Abstract

Hate Speech and harassment are widespread in online communication, due to users' freedom and anonymity and the lack of regulation provided by social media platforms. Hate speech is topically focused (misogyny, sexism, racism, xenophobia, homophobia, etc.), and each specific manifestation of hate speech targets different vulnerable groups based on characteristics such as gender (misogyny, sexism), ethnicity, race, religion (xenophobia, racism, Islamophobia), sexual orientation (homophobia), and so on. Most automatic hate speech detection approaches cast the problem into a binary classification task without addressing either the topical focus or the target-oriented nature of hate speech. In this paper, we propose to tackle, for the first time, hate speech detection from a multi-target perspective. We leverage manually annotated datasets, to investigate the problem of transferring knowledge from different datasets with different topical focuses and targets. Our contribution is threefold: (1) we explore the ability of hate speech detection models to capture common properties from topic-generic datasets and transfer this knowledge to recognize specific manifestations of hate speech; (2) we experiment with the development of models to detect both topics (racism, xenophobia, sexism, misogyny) and hate speech targets, going beyond standard binary classification, to investigate how to detect hate speech at a finer level of granularity and how to transfer knowledge across different topics and targets; and (3) we study the impact of affective knowledge encoded in sentic computing resources (SenticNet, EmoSenticNet) and in semantically structured hate lexicons (HurtLex) in determining specific manifestations of hate speech. We experimented with different neural models including multitask approaches. Our study shows that: (1) training a model on a combination of several (training sets from several) topic-specific datasets is more effective than training a model on a topic-generic dataset; (2) the multi-task approach outperforms a single-task model when detecting both the hatefulness of a tweet and its topical focus in the context of a multi-label classification approach; and (3) the models incorporating EmoSenticNet emotions, the first level emotions of SenticNet, a blend of SenticNet and EmoSenticNet emotions or affective features based on Hurtlex, obtained the best results. Our results demonstrate that multi-target hate speech detection from existing datasets is feasible, which is a first step towards hate speech detection for a specific topic/target when dedicated annotated data are missing. Moreover, we prove that domain-independent affective knowledge, injected into our models, helps finer-grained hate speech detection.

Fake news, disinformation and misinformation in social media: a review

Article 09 February 2023

Sexist Slurs: Reinforcing Feminine Stereotypes Online

Article Open access 28 November 2019

GenAI against humanity: nefarious applications of generative artificial intelligence and large language models

Article Open access 22 February 2024

Introduction

Nowadays, people increasingly use social networking sites, not only as their main source of information, but also as media to post content, sharing their feelings and opinions. Social media is convenient, as sites allow users to reach people worldwide, which could potentially facilitate a positive and constructive conversation between users. However, this phenomenon has a downside, as there are more and more episodes of hate speech (HS hereafter) and harassment in online communication [10]. This is due especially to the freedom and anonymity given to users and to the lack of effective regulations provided by the social network platforms. There has been a growing interest in using artificial intelligence and Natural Language Processing (NLP) to address social and ethical issues. Let us mention the latest trends on AI for social good [40, 41], where the emphasis is on developing applications to maximize “good” social impacts while minimizing the likelihood of harm and disparagement to those belonging to vulnerable categories. See, for example, the literature on suicidal ideation detection, devoted to early intervention [48]. There are also recent works on the prevention of sexual harassment [68], sexual discrimination [67], cyberbullying and trolling [81], devoted to contrasting different kinds of abusive behavior targeting different groups and preventing unfair discrimination.

In spite of there being no universally accepted definition of HS, this study employs the most common one. HS is defined here as any type of communication that is abusive, insulting, intimidating, and/or that incites violence or discrimination, and that disparages a person or a vulnerable group based on characteristics such as ethnicity, gender, sexual orientation and religion [33]. Accordingly, HS may have different topical focuses: misogyny, sexism, racism, xenophobia and homophobia or Islamophobia, which we refer to as topics. For each topic, hateful content is directed towards specific targets that represent the community (individuals or groups) receiving the hatred. For example, black people and white people are possible targets when the topical focus is racism [117], while women are the targets when the topical focus is misogyny or sexism [78]. HS is thus, by definition, target-oriented, as shown in the following tweets taken from [5, 25, 133], where the targets are underlined. These examples also show that different targets involve different ways of linguistically expressing hateful content such as references to racial or sexist stereotypes, the use of negative and positive emotions, swearing terms, and the presence of other phenomena such as envy and ugliness.^{Footnote 1}

(1)
Women who are feminist are the ugly bitches who cant find a man for themselves
(2)
Islam is 1000 years of contributing nothing to mankind but murder and hatred.
(3)
Illegals are dumping their kids heres o they can get welfare, aid and U.S School Ripping off U.S Taxpayers #SendThemBack ! Stop Allowing illegals to Abuse the Taxpayer #Immigration
(4)
Seattle Mayoral Election this year. A choice between a bunch of women, non-whites, and faggots/fag lovers.

Given the vast amount of social media data produced every minute^{Footnote 2}, manually monitoring social media content is impossible. It is, instead, necessary to detect HS automatically. To this end, many studies in the field exploit supervised approaches generally casting HS detection as a binary classification problem (i.e., abusive/hateful vs. not abusive/not hateful) [43, 64, 115] relying on several manually annotated datasets that can be grouped into one of these categories:

Topic-generic datasets, with a broad range of HS without limiting it to specific targets [21, 44, 52]. For example, [21] consider aggressive and bullying in their annotation scheme, while [44] looks, in addition, for other expressions of online abuse such as offensive, abusive and hateful speech.
Topic-specific datasets, where the HS category (racism, sexism, etc.) is known in advance (i.e., drives the data gathering process) and is often labeled. The HS targets, either person-directed or group-directed^{Footnote 3}, can be considered as oriented, containing, as they do, hateful content towards groups of targets or specific targets. For example, in [132] scholars sampled data for multiple targets, that is racism and sexism for, respectively, religious/ethnic minorities HS and sexual/gender (male and female) HS. Others focus on single targets including, for instance, sampling for the misogyny topic, targeting women [23, 38, 39]. Similarly, for the xenophobia and racism topics the target are groups discriminated against on the grounds of ethnicity (e.g., immigrants [5], ethnic minorities [125, 133], religious communities [128], Jewish communities [145], etc.).

Independently from the datasets that are used, all existing systems share two common characteristics. First, they are trained to predict the presence of general, target-independent HS, without addressing the problem of the variety of aspects related to both the topical focus and target-oriented nature of HS. Second, systems are built, optimized, and evaluated based on a single dataset, one that is either topic-generic or topic-specific. In order to address this issue and in order to improve the performance of the models, recent studies propose cross-domain classification, where the domain is used synonymously with dataset [65, 99, 134, 137]. The idea consists in using a one-to-one configuration by training a system on a given dataset and testing the system on another one, using domain adaptation techniques. Most existing works map between fine-grained schemes (that are specific for each dataset) and a unified set of tags, usually composed of a positive and negative label to account for the heterogeneity of labels across datasets. Again, this binarization fails to discriminate among the multiple HS targets. Thus, it has become difficult to measure the generalization power of such systems and, more specifically, their ability to adapt their predictions in the presence of novel or different topics and targets [126].

An immediate but rather expensive solution for handling a new specific target is that of building new target-oriented datasets from scratch; as has been done in previous studies [61]. In this paper, we propose instead a novel multi-target HS detection approach by leveraging existing manually annotated datasets. These will enable the model to transfer knowledge from different datasets with different topics and targets. In the context of offensive content moderation, identifying the topical focus and the targeted community of hateful contents would be of great interest for two important reasons. First, it will allow us to detect HS for specific topics/targets when dedicated data are missing. Second, it will prevent widespread stereotypes and help to develop social policies for protecting victims, especially in response to trigger events [69]. For example, with the recent outbreak of COVID-19, a spike in racist and xenophobic messages targeting Asians in Western countries was observed. A system specifically designed to detect HS that targets migrants in a pre-COVID-19 context would most likely have failed at picking out this post-COVID-19 HS. Indeed, most of the messages would not have been moderated as the type of language learned during training was for other groups, the most frequent targets of HS in pre-COVID times.

In this paper, we consider different manifestations of HS with different topical focuses, including sexism, misogyny, racism, and xenophobia. Each specific instance targets different vulnerable groups based on characteristics such as gender (sexism and misogyny), ethnicity, religion and race (xenophobia and racism). The focus on gendered and ethnicity-based HS is due, in part, to the wide availability of English corpora developed by the computational linguistics community for those targets. But it also depends on the fact that most monitoring exercises by institutions countering online HS in different countries and territories (e.g., European Commission [34]) report ethnic-based hatred (including anti-migrant hatred) and gender-based hatred as the most common type of online HS [22]. We propose to undertake the following challenges:

1.
Explore the ability of HS detection models to capture common properties from generic HS datasets and to transfer this knowledge to recognize specific manifestations of hate. We propose several deep learning models and experiment with binary classification using two generic corpora. We evaluate their ability to detect HS in four topically focused datasets: sexism, misogyny, racism, and xenophobia. Our results show that training on topic-generic datasets generally fails to account for topic-specific linguistic properties.
2.
Experiment with the development of models for detecting both the topics (racism, xenophobia, sexism, misogyny) and the targets (gender, ethnicity) of HS going beyond standard binary classification. We aim to investigate (a) how to detect HS at a finer level of granularity and (b) how to transfer knowledge across different types of HS. We rely on multiple topic-specific datasets and develop, in addition to the deep learning models designed to address the first challenge, a multitask architecture that has been shown to be quite effective in cross-domain sentiment analysis [12, 146]. We consider several experimental scenarios: first, ones where the topics/targets that will be classified in a multi-label fashion are present in the training data; and second, in cross-topic/target scenarios, where we try to predict a specific target/topic, training on data where that particular topic/target is unseen. Our results demonstrate that learning HS classification (main task) and the topic/target of HS (auxiliary task) simultaneously achieves very good results. This result is an encouraging first step, demonstrating that multi-target HS detection from existing datasets is feasible. This is true even in the absence of target-specific data towards a given target, something which can be of crucial importance when annotated data about the target are missing.
3.
Study the impact of affective semantic resources in determining specific manifestations of HS. Affects and emotions were proven to be useful in many NLP tasks such as irony and sarcasm detection [57, 98, 120], stance classification [71, 72], information credibility assessment [49, 50], and also sentiment analysis [20, 76] in general. In this work, we also want to explore the affective characteristics of the language used in HS, continuing the very recent work by [109], which suggests a strong relationship between abusive behavior and the emotional state of the speaker. We experiment with three affect resources as extra-features on top of several deep learning architectures: sentic computing [14] resources (SenticNet [18], EmoSenticNet [106]) and semantically structured hate lexicons (HurtLex [6]). SenticNet has not, to the best of our knowledge, been used in HS detection. For each resource, we propose a systematic evaluation of the emotional categories that are the most productive for our tasks. Our results show that injecting domain-independent affective knowledge into our models helps finer-grained HS detection.

The remainder of this paper is organized as follows. In the next section, we present an overview of the main works on HS detection. Datasets describes the datasets used in this study. Generalizing Hate Speech Phenomena Across Multiple Datasets, Multi-target Hate Speech Detection, Emotion-aware Multi-target Hate Speech Detection detail, respectively: the experiments carried out and the results obtained when generalizing HS phenomena across multiple datasets; predicting multi-target HS; and building emotionally informed models. We end this paper by discussing our main findings and by providing directions for future work.

Related Work

We present the related work in four parts. First, we briefly introduce the affective computing and sentiment analysis research field, in order to provide readers with a broader context for NLP literature related to the analysis and to the recognition of affective states and emotions in texts. Second, relevant prior works specifically related to HS detection are presented. Third, we review the domain adaptation study in sentiment analysis and abusive language detection, something particularly important in bringing out the novelty of our contribution. Finally, we provide an overview of the few attempts to exploit affective information in improving abusive language detection.

Affective Computing and Sentiment Analysis

Affective computing, a development of the last decades, is the study and development of systems and devices that can recognize, interpret, process, and simulate human affects: i.e., the experience of feelings or emotions. Today, identifying affective states from text is regarded as being fundamental for several domains, from human-computer interaction to artificial intelligence, from the social sciences to software engineering [13]. The wide popularity of social media, which facilitates users publishing and sharing contents—providing accessible ways for expressing feelings and opinions about anything, anytime—also gave a major boost to this research area. This was especially true within the NLP field. Here, the abundance of data allowed the research community to tackle more in-depth, long-standing questions such as understanding, measuring and monitoring the sentiment of users towards certain topics or events, expressed in mere texts or through visual and vocal modalities [107]. Indeed, robust and effective approaches are made possible by the rapid progress in supervised learning technologies and the huge amount of user-generated content available online. Such techniques are typically motivated by the need to extract user opinions on a given product or, say, in surveying political views and they often exploit knowledge encoded in affective resources, such as sentiment and emotion lexicons and ontologies.

The interest in lexical knowledge about the multi-faceted and the fine-grained facets of affect encoded in such resources is, by no means, limited to sentiment analysis. The use of such affective resources has also recently been explored in other related tasks, such as personality [80, 86] and irony detection [35, 120] or author profiling [100]. Concerning abusive language detection, which is the specific task of interest here, there are attempts at exploiting emotion signals to improve the detection of this kind of phenomena (cf. Affective Information in Abusive Language Detection Tasks). No one has investigated the impact of emotion features on HS detection, which is one of the challenges tackled in our paper.

Supervised and Semi-Supervised Learning for Social Data Analysis

The field has recently been surveyed in [7, 142]. The vast majority of the analyzed papers describe approaches to sentiment analysis based on supervised learning, where there is a text classification task at the sentence or message level, focused mostly on detecting from text valence or sentiment, either using a binary value or with a strength/intensity component coupled with the sentiment [123]. In particular, deep learning-based methods are becoming very popular due to their high performance, and they have been increasingly applied in sentiment analysis [82, 142]. Furthermore, there is an ever-increasing awareness of the need to take a holistic approach to sentiment analysis [17] by handling the many finer-grained tasks involved in extracting meaning, polarity and specific emotions from texts. This includes the detection of irony and sarcasm [57, 66, 120].

Due to a large amount of available (but unlabeled) data, many studies have recently highlighted the importance of exploring unsupervised and semi-supervised machine learning techniques for sentiment analysis tasks. For example in [60], the authors exploited both labeled and unlabeled commonsense data. Their proposed affective reasoning architecture is based on Support Vector Machines (SVM) and the merged use of random projection scaling in a vector space model and was exploited for emotion recognition tasks.

Emotion Categorization Models and Affective Resources

Still, despite the maturity of the field, choosing the right model for operationalizing affective states is not a trivial task. Research in sensing sentiment from texts has put the major emphasis on recognizing polarities (positive, negative, neutral orientation). However, comments and opinions are usually directed toward a specific target or aspect of interest, and as such, finer-grained tasks can be envisioned. For instance, aspect-based sentiment analysis identifies the aspects of given target entities and the sentiment expressed for each aspect [105]. At the same time, the stance detection emerging task focuses on detecting what particular stance a user takes toward a specific target, something that is particularly interesting in political debates [89].

Moreover, given the wide variety of affective states, recent studies advocate a finer-grained investigation of the role of emotions, as well as the importance of other affect dimensions such as emotional intensity or activation. Depending on the specific research goals addressed, one might be interested in issuing a discrete label describing the affective state expressed (frustration, anger, joy, etc.) in accordance with different contexts of interaction and tasks. Emotions are transient and typically episodic, in the sense that, over time, they can come and go. This depends, of course, on all sorts of factors, factors which researchers might be interested in understanding and modeling according to a domain or task-specific research objectives.

Both basic emotion theories, in the Plutchik-Ekman tradition [32, 104], and dimensional models of emotions [112] provide a precious theoretical grounding for the development of lexical resources and computational models for affect extraction. Sentiment-related information is, indeed, often encoded in lexical resources, such as affective lists and corpora, where different nuances of affect are captured, such as sentiment polarity, emotional categories, and emotional dimensions [18, 90, 106]. These kinds of lexicons are usually lists of words to which a positive or negative or/and an emotion-related label (or score) is associated. Besides flat lists of affective words, lexical taxonomies have also been proposed, enriched with sentiment and/or emotion information [3, 106]. However, there is a general tendency to go towards richer, finer-grained models. These will very possibly include complex emotions. This is especially the case in the context of data-driven and task-driven approaches, where restricting automatic detection to only a small set of basic emotions is too limited, not least in terms of actionable affective knowledge. This general tendency is also reflected in the development of semantically richer resources. These include and model semantic, conceptual, and affective information associated with multi-word natural language expressions, by enabling the concept-level analysis of sentiment and emotions conveyed in texts, like the ones belonging to the SenticNet family [15, 18]. Moreover, when the task addressed is related to a specific portion of the affective space, domain-specific affective resources and lexicons can be envisioned. This is the case with abusive language detection, where the use of lexicons of hateful words [6] can lead to interesting results.

Word Intensity and Polarity Disambiguation

All such resources represent a rich and varied lexical knowledge about affect, under different perspectives, and virtually all sentiment analysis systems may incorporate lexical information derived from them^{Footnote 4}. However, many opinion keywords carry varying polarities in different contexts, posing huge challenges for sentiment analysis research. Contextual polarity ambiguity is an important still little studied problem in sentiment analysis. This has recently been addressed in [140], where a Bayesian model is proposed that uses opinion-level features to solve the polarity problem of sentiment-ambiguous words: intra-opinion features (i.e., the information that helps in thoroughly conveying the opinion); and inter-opinion features (i.e., the information connecting two or more opinions). The intra-opinion features resolve the polarity of most sentiment words. The inter-opinion features usually play a secondary role, either by improving the confidence of a good prediction or by assisting in calculations when some of the features are missing.

Another interesting challenge for the field is related to the possibility of measuring sentiment and emotion intensity, which is of paramount importance in analyzing the finer-level details of emotions and sentiments [85] in real-world applications. A novel solution to this problem is proposed in [2], where, in order to leverage the various advantages of different supervised systems, a Multi-Layer Perceptron (MLP)-based ensemble framework for predicting the intensity of sentiments (in financial microblog messages and news headlines) and emotions (in tweets) is proposed. The ensemble model combines the output of three deep learning models (Convolutional Neural Network (CNN), Long Short-Term Memory (LSTM), and Gated Recurrent Unit (GRU)) and a feature-based Support Vector Regression (SVR) model. The SVR model utilizes word and character TF-IDF, TF-IDF weighted word vectors, and a diverse set of lexicon features, such as the positive and negative word count (extracted from MPQA [135] and Bing Liu [29]), the positive, negative, and aggregate scores of each word extracted from NRC Hashtag Sentiment and NRC Sentiment140 [88], as well as the sum of the positive, negative and aggregate scores of each word computed from SentiWordNet [3]. For emotion intensity prediction, the authors also include: the word count of each of the emotions from NRC Word-Emotion Association lexicon [87]; the sum of association scores for the words with the emotions extracted from NRC Hashtag Emotion [84]; the aggregate of positive and negative word scores computed from AFINN [94]; and the sentiment score of each sentence returned by VADER [51]. The proposed framework shows good results with comparatively better performance over state-of-the-art systems.

Hate Speech Detection in Online Communication

The automatic detection of online HS is not a simple task, especially because of the thin line between abusive language and freedom of speech. For example, the use of swear words could become an issue in HS detection [96, 122], where their presence might lead to false positives: for instance, when they are used in a non-abusive way in humor, emphasis, catharsis, and when conveying informality. But they could also become a strong signal for spotting HS, when they are used in an abusive context.

Most studies that deal with automatic HS detection exploit supervised approaches to classify HS and non-HS content. First studies in the field relied on traditional machine learning approaches with hard-coded features. Several classifiers were used, such as Logistic Regression (LR) [4, 26, 30, 36, 83, 133], SVM [4, 9,10,11, 55, 124, 131], Naive Bayes (NB) [1, 70], Decision Tree (DT) [1, 9,10,11], and Random Forest (RF) [1, 4, 9,10,11]. A wide range of features have been employed including lexical features (e.g., n-grams, Bag of Words, TF-IDF, lexicon-based); syntactic features (e.g., speech parts and typed dependency); stylistic features (e.g., number of characters, punctuation, text length); as well as some Twitter specific features (e.g., the number of user mentions, hashtags, URLs, social network information [83]; and other user features [36, 108, 133]). Recently, the task of automatic HS detection has focused on exploiting neural models such as LSTM [83, 129], Bidirectional Long Short-Term Memory (Bi-LSTM) [108], GRU [91], and CNN [4] coupled with word embedding models such as FastText^{Footnote 5}, word2vec^{Footnote 6}, and ELMo [103].

A fair amount of works that deals with HS detection have come from teams that participated in recently shared tasks such as HatEval [5], Automatic Misogyny Identification (AMI) [38, 39], and Hate Speech and Offensive Content Identification (HASOC) [77]. HatEval was introduced at SemEval 2019 and focused on the detection of hateful messages on Twitter directed towards two specific targets: immigrants and women. This was done from a multilingual^{Footnote 7} perspective (English and Spanish). The best-performing system in English HatEval [62] exploited a straightforward SVM with a Radial Basis Function (RBF) kernel that uses Google’s Universal Sentence Encoder [19] feature representation. AMI, another shared task in two different evaluation campaigns in 2018 (IberEval and Evalita^{Footnote 8}), focuses on detecting HS that targets women. In English, the best results were achieved by traditional models for both AMI-IberEval (SVM with several handcrafted features [97]) and AMI-Evalita (LR coupled with vector representation that concatenates sentence embedding, TF-IDF and average word embeddings [113]). Finally, HASOC, an HS and offensive language identification shared task at FIRE 2019, covers three languages: English, German, and Hindi. For English, the best performance was achieved by an LSTM network with ordered neurons and an attention mechanism [130]. All the aforementioned shared tasks provided datasets in languages other than English: i.e., Italian, Spanish, Hindi, and German. Other languages used in shared tasks include Italian (HasSpeeDe [8] which focuses on detecting HS towards immigrants) and German (GermEval [138] which focuses on offensive language identification).

Most of the works listed here model their tasks as a binary classification, with the aim of predicting the abusiveness of a given utterance per se (i.e., without specifying either a topic or a target). In this work, we classify a message as hateful or not-hateful. But we go further. We want also to detect the HS topic and the target to whom the message is addressed. To the best of our knowledge, we are the first to address target-based computational HS detection, continuing recent corpus-based linguistic studies on categorizing HS and their associated targets [117].

Domain Adaptation in Abusive Language Detection

The study of HS detection is multifaceted, and available datasets feature different focuses and targets. Despite limitations, some works have tried to bridge this range by proposing a domain adaptation approach to transfer knowledge from one dataset to other datasets with different topical focuses.

The first attempt to deal with this issue was reported in [134]. They used the multi-task learning (MTL) approach, arguing that it would be possible to share knowledge between two or more objective functions to leverage information encoded in one abusive language dataset to better-fit others. [65] proposed using a traditional machine learning approach for classifying abusive language in a cross-domain setting, in order to get better system interpretability. This work also explored the use of the frustratingly simple domain adaptation (FEDA) framework [24] to facilitate domain sharing between different datasets. The main finding of this work is that the model did not generalize well when applied to various domains, even when trained on a much bigger out-domain dataset. [111] adopted transfer learning as a domain adaptation approach by exploiting the LSTM network coupled with ELMo embeddings. LSTM has also been used by [99], who employed it with a list of abusive keywords from the Hurtlex lexicon [6], as a proxy for transferring knowledge across different datasets. Their main findings are: (i) that the model trained on more than one general abusive language dataset will produce more robust predictions; and (ii) that HurtLex is able to boost the system performance in the cross-domain setting.

Bidirectional Encoder Representations from Transformers (BERT) [28] was also applied in cross-domain abusive language detection [122]. This work found that BERT can share knowledge between one domain dataset and other domains, in the context of transfer learning. They argue that the main difficulty in the cross-domain classification of abusive language is caused by dataset issues and their biases. It is consequently impossible for datasets to capture the phenomenon of abusive language in its entirety. [92] also investigated BERT by using new fine-tuning methods based on transfer learning, relying on Waseem [133] and Davidson [26] datasets in their experiments. Finally, HatEval, a recently shared task [5], also provided an HS dataset that covers two different targets, women and immigrants. Therefore, participants are required to build a target-agnostic model able to detect HS with more than one target (cf. Hate Speech Detection in Online Communication).

Cross-domain classification approaches in abusive language detection share three common characteristics: (1) Dataset labels are aligned to deal with the varieties of annotation schemes. Hence, all datasets (be they topic-generic or topic-specific) share the same coarse-grained characterization of HS (i.e., hateful vs. non-hateful). (2) Systems follow a one-to-one configuration (i.e., they are trained on one dataset and tested on another) in order to analyze their robustness in generalizing the different phenomena contained in each dataset. (3) Predictions are binary, ignoring the target/topic nature of HS. In this work, we intend to focus on the different topics/targets in several datasets by proposing a multi-target HS classification task.

To this end, instead of using the typical one-to-one configuration, we propose to solve the problem using a many-to-many configuration capable of identifying a given topic/target when trained in topic-generic or topic-specific datasets. The many-to-many configuration has already been shown to be quite effective in cross-domain aspect-based sentiment analysis [12, 46, 53, 74, 102, 146] and is used here for the first time in an HS detection task.

Affective Information in Abusive Language Detection Tasks

Recently, some works exploiting emotion signals to improve abusive language detection have been carried out. The study by [114] proposed an architecture that uses the Emotion-Aware Attention (EA) mechanism to quantify the importance of each word based on the emotion conveyed by the text. They used DeepMoji model [37] and NRC Emotion Lexicon [87] to extract emotion information from the given texts. Their analysis of the results shows the importance of affective information in augmenting system performance. Similar conclusions have been drawn in [96] who exploited the NRC Emotion Lexicon [87] and EmoSenticNet [106]. Finally, the most recent work by [109] came up with a joint model of emotion and abusive language detection in a MTL setting. This led to significant improvements in abuse detection performance when evaluated in both the OffensEval 2019 [144] and Waseem and Hovy datasets [133].

As far as we know, no previous work has explored the impact of emotion features in predicting HS targets in a multi-target setting. We propose to employ EmoSenticNet, HurtLex, and for the first time, SenticNet. For each resource, we identify the emotion categories that are the most suitable for predicting a given topic/target of HS detection.

Datasets

We experiment with seven available HS corpora from previous studies among which two are topic-generic (Davidson [26] and Founta [44]), and four are topic-specific about four different topics: misogyny (the AMI dataset collection from both IberEval [39] and Evalita [38]), misogyny and xenophobia (the HatEval dataset [5]), and racism and sexism (the Waseem dataset [133]). Each of these topics target either gender (sexism and misogyny) and/or ethnicity, religion or race (xenophobia and racism).

In this section, we first detail the characteristics of each of the seven datasets, then provide general statistics.

Datasets Description

Davidson. The dataset has been built by [26] and contains 24,783 tweets^{Footnote 9} manually annotated with three labels including hate speech, offensive, and neither. These tweets were sampled from a collection of 85.4 million tweets gathered using the Twitter search API, focusing on tweets containing keywords from HateBase^{Footnote 10}. The dataset was manually labeled by using the CrowdFlower platforms^{Footnote 11}, where at least three annotators annotated each tweet. With an inter-annotator agreement of 92%, the final label for each instance was assigned according to a majority vote. Only 5.8% of the total tweets were labeled as hate speech (cf. (5)) and 77.4% as offensive (cf. (6)), while the remaining 16.8% were labeled as not offensive.
1. (5)
  #DTLA is trash because of non-Europeans are allowed to live there
2. (6)
  What would y’all lil ugly bald headed bitches do if they stop making make-up & weave?
Founta. The dataset consists of 80,000 tweets^{Footnote 12} annotated with four mutually exclusive labels including abusive, hateful, spam and normal [44]. The original corpus of 30 millions tweets was collected from 30 March 2017 to 9 April 2017 by using the Twitter Stream API. For each tweet, the authors also extracted the meta-information and linguistic features in order to facilitate the filtering and sampling process. Annotation was done by five crowdworkers and the final dataset was composed of 11% tweets labeled as abusive (cf. (7)), 7.5% as hateful (cf. (8)), 59% as normal, and 22.5% as spam (cf. (9)).
1. (7)
  Benedict Cumberbatch is a damn stupid name. I hope history doesn’t remember him fondly. I hope his legacy becomes trash.
2. (8)
  Niggas worst than your side bitch always questioning they position
3. (9)
  Beats by Dr. Dre urBeats Wired In-Ear Headphones - White https://t.co/9tREpqfyW4 https://t.co/FCaWyWRbpE
Waseem. It consists of tweets collected over a period of two months by using representative keywords (common slurs) that target religious, sexual, gender and ethnic minorities [133]. The authors manually annotated the dataset with a third expert annotator reviewing their annotations. The final dataset consists of 16,914 tweets, with 3,383 instances from Sexism_Waseem targeting gender minorities (cf.(10)), 1,972 from Racism_Waseem with racist instances (cf. (11)), and 11,559 tweets that were judged to be neither sexist nor racist^{Footnote 13}.
1. (10)
  Sounds like we’ve got a well good ref’ today, bloody women should just stay in the kitchen!
2. (11)
  It’s not about any specific individuals, but about an ideology that will always produce terrorists.
AMI corpora. The main goal of the AMI task consists in identifying tweets that convey hate or prejudice against women while categorizing forms of misogynous behavior (stereotype & objectification, dominance, derailing, sexual harassment & threats of violence, discredit), as well as classifying the target of a given instance (specific individual or a generic group). The datasets used in these tasks were collected by employing three different approaches: representative keywords and hashtags; monitoring potential victims; as well as by downloading the history of users that have explicitly misogynistic behavior on their Twitter profiles. We use in this study the two AMI datasets: IberEval [39] containing 3,977 tweets collected over a period of four months (from 20th of July until 30th of November 2017) and Evalita [38] that comprises 5,000 tweets. Below are two examples of tweets annotated as misogyny taken, respectively, from IberEval and Evalita. Their associated misogynisitic behavior are ”sexual harassment” in the first example and ”derailing” in the second.
1. (12)
  I kinda want to see you again just so I can punch you in the kidney. #WomenSuck
2. (13)
  Yes yes Ann, lets continue to perpetuate the hysterical woman stereotype. Such a shame. You dont deserve your position of power. A disgrace to fellow women.
HatEval. The dataset consists of 13,000 tweets distributed across two different targets: immigrants (cf. (14)) and women (cf. (15)) [5]. Most of the tweets that target women were derived from the AMI corpora, while the remainder of the dataset was collected over a period of three months (from July to September 2018) by employing the same approaches as AMI. The dataset was annotated by using the Figure Eight crowdsourcing platform. In each instance, the annotators were asked to specify whether a tweet conveys HS or not towards any given targets. The annotators were also asked to indicate whether the author of the tweet was aggressive and to identify the target of the tweet (i.e., a specific individual or a group of people). Although the inter-annotator agreement obtained for each category (0.83, 0.73, and 0.70, respectively) was quite high, the final label was assigned based on a majority vote by adding two expert annotations to the crowd-annotated data. The final distribution of the dataset includes 13,000 tweets (6,500 for each target).
1. (14)
  Your boats shall drown in the Mediterranean Sea and the rest of you, which had not assimilated into our society will leave immediately. #RefugeesNotWelcome #IllegalAliens
2. (15)
  Its a good thing I always wear a glove on my left hand because if I EVER had to touch hands with a woman my IQ would totally drop to 0 Lol

Datasets Statistics

Table 1 provides a general overview of the datasets, along with the labels used in their annotation schemes. We can observe that the classes are imbalanced in most datasets, where the majority class is the negative class (non-HS), except for the AMI collection (AMI-IberEval and AMI-Evalita) and Davidson.

Table 1 General overview of the datasets along with their topics and targets

Emotionally Informed Hate Speech Detection: A Multi-target Perspective

Abstract

Similar content being viewed by others

Fake news, disinformation and misinformation in social media: a review

Sexist Slurs: Reinforcing Feminine Stereotypes Online

GenAI against humanity: nefarious applications of generative artificial intelligence and large language models

Introduction

Related Work

Affective Computing and Sentiment Analysis

Supervised and Semi-Supervised Learning for Social Data Analysis

Emotion Categorization Models and Affective Resources

Word Intensity and Polarity Disambiguation

Hate Speech Detection in Online Communication

Domain Adaptation in Abusive Language Detection

Affective Information in Abusive Language Detection Tasks

Datasets

Datasets Description

Datasets Statistics

Generalizing Hate Speech Phenomena Across Multiple Datasets

Methodology

Models

Results

Results for the \(Top^G \longrightarrow Top^S\) Configuration

Results for the \(Top^S \longrightarrow Top^S\) Configuration

Multi-target Hate Speech Detection

Methodology

Models

Results

Results for the \(T^S \longrightarrow T_{seen}^S\) Configurations

Results for the \(T^S \longrightarrow T_{unseen}^S\) Configuration

Emotion-aware Multi-target Hate Speech Detection

Methodology

Models

Sentic-based Models

Hurtlex-based Models

Results

Results for Sentic computing emotion features

Results for Hurtlex emotion features

Discussions and Error Analysis

Main Conclusions

Impact of Bias in Multi-target Hate Speech Detection

Error Analysis

Conclusion and Future Work

Notes

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflicts of Interest

Ethical Standard

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation