Introduction

According to a Eurobarometer survey [1], 75% of people who follow or participate in online debates have witnessed or experienced abuse, threat, or hate speech. A more recent study by Eurostat [2] points to concerns about the online behaviors of children and young people and the harmful consequences—greater dependency, anxiety, and aggression. Both governmental and civil society organizations have put substantial effort into tackling hate speech (e.g., [3, 4, 20], etc.). Nonetheless, numerous scientific studies have indicated that the incidence of hate content is continually increasing [5,6,7].

Successfully combating hate speech requires a proper multidisciplinary understanding of the phenomenon. Intuitively, people generally have some inner understanding of what hate speech is and when they cross the line. But the extent to which they assess the content and form can vary greatly [8]. Hate speech should be tackled and prosecuted, based on its operationalization. As stated by Brown [, p. 45], “How can we ban it if nobody agrees about what it is?” A number of ad-hoc definitions have been introduced in recent years by legal entities, scholars, and social networks. Producing an accurate definition of online hate speech is, however, a challenging task given its multidimensional and complex nature. For example, legal definitions, which excessively concentrate on explicit hate, could make hate speech detection ineffective and targeted groups may feel that there is no one to protect them.9

Providing a precise, universal, operable definition of online hate speech that is generally accepted by multiple stakeholders—scholars and practitioners, who may come from different, unrelated fields—is even more challenging. It is nonetheless extremely important—such a definition would make it easier for stakeholders to share know-how, annotate data consistently, compare automatic detection methods, study, and compare the prevalence of hate speech across different platforms/countries, and would generally make communication easier, in the knowledge that others are likely to have some understanding of what we mean when we talk about online hate speech. Nevertheless, there is as yet no such theoretical definition [10, 11] and none is likely to appear and become widely accepted in the near future.

To address this problem and overcome the existing ambiguity, and lack of a unified and operational theoretical definition, we adopt a slightly different approach—we define hate speech empirically by providing a list of hate speech indicators (i.e., specific, observable, measurable characteristics that offer a practical means of defining the concept) and the rationale behind them. To the best of our knowledge, thus far such an approach has not been used to operationalize hate speech. Nevertheless, it has been used for other types of undesired user behavior and content (e.g., cyberbullying or fake news). In addition, some of the literature (e.g., [12, 13] already relies on some indicator precursors to overcome the ambiguity in definitions of hate speech and other relevant concepts.

The goal of this paper is therefore to operationalize hate speech using indicators and explore their use and the implications thereof, using a social science and computer science approach.

The main contributions of our approach are:

  1. 1.

    Instead of grounding hate speech in a psychological narrative, we take a pragmatic approach (inspired by diagnostic manuals like the Diagnostic and Statistical Manual of Mental Disorders DSM-5 [14] or the International Classification of Diseases ICD-11 [15]) and (1) provide a list of hate speech indicators and the rationale behind them using a multidisciplinary approach, and (2) depict the structure of the indicators to reveal the core/peripheral indicators and the strength of the relationship between them.

  2. 2.

    Online hate speech is a research field that is open to diverse knowledge-based fields, and cross-disciplinary approaches are both key and often recommended [11, 16,17,18]. While social science researchers lack knowledge on automatic hate speech detection, most data analysts have no social science background. This paper takes into account the multiplicity of both computer science and social science approaches.

  3. 3.

    Although there are automatic methods of hate speech detection, using advanced NLP and ML methods, much could be done to aid social scientists interested in under-taking large-scale studies and in-depth research on hate speech. Having a set of quantifiable indicators could prove to be a beneficial pragmatic approach: (a) for measuring changes/assessments; (b) enabling international comparisons; (c) improving communication between scholars; (d) recognizing the extent of hate speech on the local/national level; (e) as a basis for legal proceedings; and from the computer science perspective, for (f) the annotation of extensive new hate speech datasets and for training novel explainable detection methods that provide explanations, or justifications, on what aspects of the text (e.g., a comment) lead to the decision of the automated method.

Background and related work

Definitions of hate speech

Definitions of hate speech can largely be found in four different contexts: (1) legal, (2) lexical, (3) scientific, and (4) practical; and these differ in scope and content. Below is a brief overview of each.

  1. (1)

    The purpose of legal definitions is straightforward: to identify messages that violate existing legal norms and require government regulation, i.e., messages that are publicly shared, incite, promote, or justify hatred, discrimination, or hostility toward a specific group and/or individual, based on certain attributes, such as race or ethnic origin, religion, disability, gender, age, sexual orientation/gender identity [4, 19,20,21]. There is no universally accepted legal definition of hate speech and different states, the primary duty bearers under international human rights, fall under different jurisdictions [22], Walker 1994 in Calvert [23]. What is considered “hateful” can be controversial and disputed. State actors, including Governments, legislatures, State authorities, and courts, therefore need a common understanding of the phenomenon before they can take relevant action to address it [24].

  2. (2)

    Five of the most well-known online dictionaries define hate speech as follows. Hate speech is: (1) “public speech that expresses hate or encourages violence towards a person or group based on something such as race, religion, sex, or sexual orientation” (The Cambridge Dictionary); (2) “speech expressing hatred of a particular group of people” (The Merriam Webster); (3) “speech that attacks a person or group on the basis of race, religion, gender, or sexual orientation” (Collins Dictionary); (4) “a statement expressing hatred for a particular group of people” (The MacMillan Dictionary); (5) “speech or writing that attacks or threatens a particular group of people, especially on the basis of race, religion or sexual orientation” (Oxford Learner's Dictionary). All these definitions share the same characteristics. They define hate speech as a message (written or oral) that expresses hatred and encourages violence toward a specific group of people. These simple definitions concentrate on the meaning of the words that make up the collocation—hate and speech.

  3. (3)

    Scientific definitions go much further, beyond the apparent meaning of the constituent words. The term hate speech is used in different disciplines such as economics, philosophy, sociology, psychology, or computer science, and, despite the lack of a definitional consensus, we can trace some of the common constituent characteristics. As Parekh [25] states, when defining hate speech, it is necessary to distinguish hate speech from related terms that do not fit the definition, such as expressing dislike, lack of respect, a demeaning view of others, disapproval, the use of abusive or insulting speech, and speech that does “not call for action” (p. 40). These subtle distinctions could be resolved by having a proper definition of hate speech.

According to Parekh [, pp. 40–41] hate speech “expresses, encourages, stirs up, or incites hatred against a group of individuals distinguished by a particular feature or set of features such as race, ethnicity, gender, religion, nationality, and sexual orientation. […] and is often (but not necessary) expressed in offensive, angry, abusive, and insulting language”. Moreover Cohen-Almagor [25, 26] states that it is intended to dehumanize, harass, intimidate, debase, degrade, victimize, or incite brutality against the targeted groups. For an extensive philosophical analysis of the term hate speech, see Brown [9, 22].

Parekh [25] further recognizes three main aspects of the construct of hate speech: (1) it is directed at specific individuals or groups of individuals, (2) undesirable attributes are ascribed to the target group, thereby creating stigmatization; (3) it leads to discrimination because the ascription of undesirable attributes encourages the target to be seen as undesirable.

The third aspect of Parekh's definition is highlighted by Gelber [27] as well—the inequality is a core element of hate speech. In the subordinate–superior relationship, hate speech puts the creator in a position of authority and the target in the position of the subordinate, which encourages the target to be seen as inferior and legitimizes discriminatory behavior [27].

According to Gelber [27], it is important also to note that the emotion of hatred does not have to be evoked for hate speech to cause harm. Hate is not and does not have to be a key element of the definition (see also [22], “myth of hate”). On the other hand, as Brown [9, 22] points out, if manifestations of hate speech did not include hate, then this concept would be grossly misleading, but it is not. In summary, a lot of speech evokes the strong emotion of hate, but not necessarily all of it.

Last but not least, it should be noted that hate speech need not relate only to a minority or disadvantaged social group as was previously thought by classic scholars such as Jacobs and Potter (1998) or Walker (1994) as cited in [28]. In past centuries hate speech was specifically directed at minorities (e.g., Afro-Americans, the Roma community).

In the computer science research literature (see our previous work [11] for an overview of the computer-science perspective on hate speech), the most common focus related to hate speech is its (semi-)automatic detection. With the help of natural language processing (NLP) and machine learning (ML), the models can classify a particular piece of content (usually textual) according to the binary presence of hate speech or a particular aspect thereof (e.g., whether it attacks an individual or a group), see the survey by Fortuna and Nunes [10] for more information. In computer science (and especially data science), there are two specific factors influencing the employed definitions of hate speech [11]: (1) adoption of how the data are already annotated (either by other researchers or by social media platforms, (2) the observable factors that can be identified from the content available in the dataset (i.e., without accessing more intangible clues such as the authors' thinking, intent, or attitude that can be studied in other disciplines, e.g. psychology.

Like many other disciplines, computer science lacks a generally accepted definition of hate speech. In addition, it is not clear how hate speech relates to other similar/related concepts (e.g., toxic language) or superior concepts (e.g., abusive/offensive language) [11]. The boundaries between these concepts are often blurred and there are few works that specifically tackle the distinctions. One example is Davidson et al. [29], who attempt to distinguish hate speech from other instances of offensive language in their work on the automated detection of hate speech.

  1. (4)

    Practical definitions can be found in the way online platforms (e.g., social media) and existing detection tools define hate speech. Facebook, Twitter, and YouTube are social media platforms that mediate online communication and have developed their own definitions of hate speech to ensure users stick to the rules and as part of their internal regulatory policies (e.g., terms of service or community standards). They have signed a Code of Conduct on the regulation of illegal hate speech with the European Commission [20]. Facebook’s Community Standards [30] define hate speech as “a direct attack against people on the basis of what they call protected characteristics: race, ethnicity, national origin, disability, religious affiliation, caste, sexual orientation, sex, gender identity and serious disease”.

The practical tools for identifying and moderating hate speech, such as profanity filters, content moderation filters, or human driven approaches, have been widely studied [5, 10, 11, 31, 32]. In the context of social media platforms, the findings on utilization of practical tools are somewhat puzzling as they either show that social platforms perform too much or too little content moderation and lack transparent decision-making processes [33].

Capturing the complexity of hate speech in a unified, precise, and operable definition (either theoretical or more empirically oriented) could potentially lead to a better understanding of hate speech and help reduce its occurrence online. Fortuna and Nunes [10] reviewed all four types of (legal, lexical, scientific, and practical) hate speech definitions. Following a content analysis, they proposed their own theoretical definition (probably one of the most fleshed out ones) [10, p. 5]:

Hate speech is language that attacks or diminishes, that incites violence or hate against groups, based on specific characteristics such as physical appearance, religion, descent, national or ethnic origin, sexual orientation, gender identity or other, and it can occur with different linguistic styles, even in subtle forms or when humour is used.

There are other useful and inspiring attempts at a universal definition reached through a process of characterizing the toxicity of comments [34, 35], or utilizing context information [36, 37].

Nevertheless, when it comes to their practical application (e.g., during data annotation for machine learning), these theoretical definitions are still too vague and difficult to capture. In addition, many local factors (e.g., cultural, platform, task at hand) make it difficult or even impossible to produce a universal theoretical definition. Indeed, we argue that the potential for finding one seems unlikely at the moment.

Consequently, our aim in this paper is to define hate speech empirically by identifying specific, measurable, observable hate speech indicators.

Indicators of hate speech

An indicator is “a sign that shows you what something is like or how a situation is changing” (Oxford Learner's Dictionary). Previous studies have looked at indicators of cyberbullying [38], operational indicators of fake news [39], and behavioral indicators related to sexting [40]. Indicators are also frequently used in psychopathology as a means of conceptualizing and diagnosing mental health problems. We propose that operationalizing hate speech in the form of indicators could fulfil the same purpose—provide objective, measurable cues to help “diagnose” hate speech and find effective interventions.

Some existing research papers on hate speech deal with the content characteristics that describe hate speech and can be considered precursors to hate speech indicators. However, these are not primarily aimed at identifying hate speech indicators using an adequate methodology and so do not assess them satisfactory.

First, Fortuna and Nunes [10] identified four dimensions (that can be considered analogous to indicators) that allowed them to compare definitions of hate speech: (1) hate speech has specific targets, (2) hate speech is to incite violence or hate; (3) hate speech is to attack or diminish; and (4) hate speech can be expressed through (and hidden in) humor.

Secondly, some computer science studies have gone further than producing a binary classification and have distinguished different aspects of abusive/offensive language (as a superior concept to hate speech). Waseem et al. [12] proposed a typology of abusive language based on defining two dimensions: (1) directed versus generalized, and (2) explicit versus implicit abusive language. Zampieri et al. [13] similarly annotated and automatically classified offensive language according to directness (targeted insult, untargeted) and target identification (individual, group, other). By emphasizing these critical aspects of offensive/abusive language, these studies reduce the ambiguity in definitions of hate speech and related concepts. Ousidhoum et al. [41] focused specifically on hate speech and manually annotated a multilingual dataset of Tweets in three languages, distinguishing five aspects: directness, hostility type (e.g., abusive, hateful, offensive), target attribute (i.e., what is being targeted, e.g., origin, gender, religion), target group (i.e., who is being targeted, e.g., individual, women), and annotator sentiment (i.e., the emotion felt by the annotator after reading the Tweet, e.g., disgust, fear, anger).

Other things resembling indicators can be identified in data annotation methodologies as well. The manual annotation of hate speech is a subjective process and annotators need extensive cultural and societal knowledge [42]. Human annotators must therefore be provided with detailed instructions on how to recognize and label hate speech consistently. For this purpose, Waseem and Hovy [43] proposed a list of 11 criteria to guide annotators, examples items are: hate speech (1) uses sexist or racial slurs, (2) criticizes a minority and uses straw man arguments; and (3) defends xenophobia or sexism.

Finally, in NLP and ML models, training of detection models commonly involves a feature engineering step. Features utilized to train models (especially those with a strong prediction capability) can be considered a kind of indicator as well. Fortuna and Nunes [10] analyzed a number of detection methods and identified two categories of features: general features commonly used in text classification approaches and specific hate speech detection features. The latter category, which is more closely related to the concept of indicators, was further divided into several subcategories of features that are intrinsically related to the characteristics of hate speech: (1) othering language, (2) perpetrator characteristics, (3) objectivity-subjectivity of the language, (4) declarations of superiority of the ingroup, (5) a focus on particular stereotypes, and (6) intersectionism of oppression.

To conclude, despite the increasing understanding in the social sciences of the factors that may underlie online hateful behavior [44, 45], we found no studies attempting to systematically develop a set of indicators covering the main aspects of hate speech. To fill this gap and conduct a systematic study of hate speech indicators, we aim to provide an empirical definition of hate speech, while taking into account both social science and computer science approaches (with the help of existing research and interdisciplinary experts).

Choosing the hate speech indicators

The primary goal of our research was to propose an empirical definition of hate speech. As stated by Chaffee [46] “an empirical definition of a concept enables us to conclude whether an event we encounter is an instance of that concept or not.” This social science insight has extremely important relevance for computer science and supervised machine learning—in practice, it could improve the accuracy and effectiveness of hate speech detection and removal.

The concept of hate speech is a social construct. It represents agreement between people about what speech is hateful and what is not. Therefore, we started our search for an empirical definition by holding focus group discussions with two groups: group 1 comprising 12 participants selected and assembled by the authors of this manuscript; group 2 consists of a team of researchers—psychologists with different levels of seniority—primarily the authors of this manuscript. The participants (group 1), who varied in gender, age and educational background, discussed and commented on the concept of hate speech, based on their personal experiences. Research ethics were concerned with respecting the research participants throughout the study. The focus groups of researchers took the form of a structured informal discussion described below. Eight focus group sessions were conducted over two months, April–May 2020, with the team of researchers (group 2) in one-hour regular discussion sessions. All eight focus groups were conducted utilizing a dual moderator model [47, 48] until the mutual agreement on the selected indicators was reached. First the discussion focused on hate speech indicators in general, but as the sessions continued the focus shifted more towards the hate speech indicators relating to migrants. The focus on migrants was selected on the basis of current reports by: (1) the European Commission [49] stating that xenophobia, anti-migrant hatred (15%), and anti-gypsyism (9.9%) are among the most reported grounds for hate speech; (2) UNHCR [50] about powerful narratives denigrating refugees and turning them into objects of fear; (3) the ECRI report on Slovakia [51] pointing to escalation in hate speech against Jews, Muslims, migrants, Roma, and black people.

The most significant outcome of the focus group discussions and subsequent non-systematic review of the hate speech literature was the creation of a list of 14 hate speech indicators relating to migration and migrants (see Fig. 1). The indicators cover the various features of hate speech in a form that is easily accessible to policy makers, educators, researchers, and human rights activists. Crucially, the occurrence of each indicator is easily quantifiable.

Fig. 1
figure 1

Initial 14 hate speech indicators identified via the literature review and focus group discussions

The list of 14 indicators (including general information about the purpose of the study and a description of each indicator including a representative example) was then sent to experts who were asked to evaluate the content validity of the indicators and for any additional comments. The experts were all based in Slovakia and specialized in asylum and migration law (Human Rights League), sociology (Centre for the Research of Ethnicity and Culture), leading media (SMEFootnote 1), a non-profit organization (DigiQFootnote 2), IT, social work and human rights protection, and psychology. Representatives of minorities in Slovakia (Islamic Foundation Slovakia) were also sent a copy. An online consultation was held with one of the experts in relation to her comments and recommendations. Altogether, 18 experts responded and evaluated the validity of the indicators. Their task was to indicate (on a 5-point scale where 0 = does not indicate hate speech at all; 4 = fully indicates hate speech) the intensity of each indicator. The intraclass correlation coefficient (ICC; indicates the level of agreement between the raters) shows very good agreement between the experts’ evaluations, ICC = 0.80, p < 0.001. The experts concluded that denial of other rights is the most intensive hate speech indicator whereas the weakest hate speech indicators were manipulation, slant, and use of tendentious analogies (for more details see https://osf.io/mctkv/). After checking the descriptives of the expert ratings, we further discussed and re-evaluated the content of the indicators and merged those we deemed very similar, reducing the list to the final 10 indicators. For an overview of the process see Fig. 2.

Fig. 2
figure 2

Flowchart of the process of selecting the indicators

List of hate speech indicators and the rationale

The list of 10 indicators and the rationale behind them was arrived at through a process of explication, identifying the theoretical and empirical definitions of hate speech and then evaluating and modifying them.

Below we provide a description of each indicator. The purpose of the indicators is to provide a simple but comprehensive picture of the hate speech phenomenon, which can then be used to help to identify, evaluate, or summarize the results achieved along the way in tackling hate speech, and may help in monitoring hate speech trends in different regions and cultures. The indicators can also be used for setting goals and benchmark criteria. The indicators are relevant, unambiguous, comprehensible, interpretable, and comparable over time.

  1. 1.

    Sexist language: sexist hate speech relates to utterances which spread, incite, promote, or justify hatred based on sex, gender, or sexual preference, with the aim of humiliating and objectifying the target, destroying their reputation, and making them vulnerable and fearful. It also includes second sexism—discrimination against men and boys. The feminist aspect is even more prominent in relation to migration. Almost half of migrants are women and girls (Gender Statistics [52]). Negative depictions often appear in the media, where female migrants often face double discrimination—as women and as migrants.

  2. 2.

    Attacking minorities as a “traditionally disadvantaged group”: this indicator includes any sort of racial offense, xenophobia, antisemitism, discrimination on the grounds of the person’s age, disability, ethnicity, national or religious group aimed at inciting intolerance or racial and ethnic hatred.

    1. 5.

      Denial of fundamental human rights: this includes any calls for the exclusion or segregation of a person or group of people, being denied the right to promote one’s language and traditions, peaceful assembly etc.

    2. 6.

      Promoting violent behavior: comments aimed at inciting others to commit acts of violence or that can reasonably be expected to have that effect. This also includes excusing violence, promoting negationism, and condoning terrorism.

    3. 7.

      Problematic hashtags, nicknames, symbols: this can also mean referring to an organization that has committed hate crimes.

    4. 8.

      Ad hominem attacks: these are personal attacks on a person's character or motive, based on feelings of prejudice, rather than facts, logic. They include accusations of lying, ignorance, stupidity, and condescension.

    5. 9.

      Negative stereotypes of a minority: these are negative expectations about the out-group, and appear alongside negative emotions (e.g., fear, anger) towards the out-group. They promote the view that the targeted group is inferior or subordinate and are closely linked and related to Indicator no. 2.

    6. 10.

      Texts containing ambiguous statements, irony, sarcasm: making generalized negative statements about minority groups, ridicule, intimidation, hostility. Mocking the concept, events or victims of hate crimes, even if no real person is discussed.

    7. 11.

      Manipulative texts/misinterpretation of the truth: any misinterpretations made purposefully, with the intention of fooling or harming someone. For instance, denial of historical events.

    8. 12.

      Slurs and vulgarisms: direct, explicit verbal or nonverbal attacks, using insulting labels to refer to an individual or group based on their race, ethnicity, national origin, religious affiliation, or other attribute.

Our proposed list of indicators is similar to work by Waseem and Hovy [43] based on critical race theory. They provided a list of criteria for human annotators identifying and annotating hate speech based on Twitter feeds. It differs from our indicators in that we relied on the views of the multidisciplinary experts involved (i.e., we did not base our list on theory alone) and, while some of the indicators are the same (e.g., hate speech is the use of a sexist or racial slur, attacks a minority, uses problematic hashtags, negatively stereotypes a minority), others are slightly different. In addition, following the preliminary exploratory examination of the structure of hate speech (cf. “Using indicators to examine the structure of hate speech”), it seems that the denial of human rights and promotion of violent behavior (not present in the work of Waseem and Hovy [43]) plays a central role in the network of indicators.

Using indicators to examine the structure of hate speech

To obtain better insights into hate speech, we examined the internal structure and dynamics of hate speech indicators via a network analysis. In recent years, the network approach to the conceptualization of social and behavioral phenomena has appeared in the behavioral sciences and has become increasingly popular, especially in psychopathology (e.g., [53]). The network approach shifts the focus away from latent constructs (i.e., unobservable entities that cause a set of observable behaviors) and toward complex systems for grouping variables. In this perspective, a phenomenon occurs owing to a causal system of mutually interacting variables. The four network theory principles for conceptualizing mental disorders have been summarized by Borsboom [54] and can be applied to hate speech contexts and the indicators we propose. That is, hate speech can be regarded as a specific combination of indicators of varying intensities that mutually interact and reinforce/weaken each other (the indicators theoretically correspond to the nature of the construct, i.e., the indicators proposed above).

The full procedure of hate speech structure examination was as follows:

  1. 1.

    A sample of around 240 comments about migration/migrants was obtained via a qualitative analysis of public social media and local news (Facebook, SME, HNonline,Footnote 3Magazin1Footnote 4) or created by some of the members of the research team. The comments differed in the level of hate expressed: some slightly misinterpreted historical facts, whereas others contained vulgarisms or explicitly called for violent acts. To prevent the range restriction phenomenon, neutral comments were also selected to capture the whole spectrum of speech (ranging from neutral comments to very hateful comments).

  2. 2.

    Members of the research team were then asked to rate each comment in terms of the indicators described above and to assess the hate speech in the comment overall. Each comment was independently rated by four members of the research team, consisting primarily of the authors of this manuscript.

  3. 3.

    The process of selecting comments was stopped once to check how well the comments represented the whole spectrum of hate speech severity (based on the ratings). Sexism, irony, and manipulative texts from the spectrum of positive, neutral, and very hateful comments seemed to be underrepresented, so we added another 60 comments.

  4. 4.

    Before proceeding to the analysis, we screened the comments one last time and excluded those (N = 13) not explicitly related to migration. The resulting dataset consisted of 283 representative comments. The final set of comments (in Slovak) is available at https://osf.io/452pb/.

  5. 5.

    Once the selection/extraction of the comments had been finalized, the level of agreement between the raters for each indicator was calculated. Despite all the ICCs being significant, the average level of agreement was ICC = 0.64, indicating moderate reliability, according to the rule of thumb. However, given that some indicators, like ad hominem attacks, manipulative statements, or ambiguous or ironic statements, are naturally dependent on the person interpreting them, rater agreement can in fact be better than appears at first glance.

  6. 6.

    The rating for each indicator was then averaged (individual ratings can be found in the dataset at https://osf.io/g3b87/).

  7. 7.

    Furthermore, we checked how well the average perception of hate speech correlated with the general impression of hate speech, the average severity of the indicators and their product (the sum score of the indicators), as well as how strongly the indicators were correlated. Excluding sexism, there was a moderate to strong relationship between all the indicators and the general impression of hate speech. All the indicators correlated positively, with the magnitude of the correlations ranging from 0.02 to 0.72; the average correlation between the indicators was r = 0.32.

  8. 8.

    A network analysis approach was used to examine the structure of the hate speech indicators. It tells us how the indicators (nodes) of a phenomenon interconnect (edges), which play a more central role (strength and expected influence: how strongly an indicator is related to other indicators; closeness and betweenness: how well an indicator connects with other indicators), and which do not. For the purposes of this demonstration, a network consisting of the average evaluation of each of the hate speech indicators was computed. Here we depict the structure of hate speech (only for the 283 comments which featured at least one hate speech indicator; the network of all the comments is available in the supplementary materials), show the estimates of the centrality measures, and very briefly discuss the stability of the network. For more details on the technical side, please see the supplementary materials at https://osf.io/g3b87/.

For descriptive purposes, the correlation matrix and the corresponding network of binary relationships are presented in Table 1 and Fig. 3a respectively. A regularized network of the hate speech indicators is depicted in Fig. 3b. Please note that Fig. 3a, b are visualizations of the networks—although these are intuitive to comprehend, visualizations can be misleading and so interpretations should be based on the statistical estimates. Regarding the estimated centrality measures (Fig. 4), the core indicator of hate speech appears to be: (a) denial of fundamental human rights accompanied by (b) promoting violent and aggressive behavior, and (c) the use of vulgarisms. In contrast, a rather peripheral role in the network is played by the use of: (a) ambiguous statements, irony, or sarcasm and (b) the use of problematic hashtags or nicknames. Paradoxically, after controlling for the other variables in the network, some of the correlations between denial of human rights and the other indicators ended up negative (that is why the expected influence drops so substantially compared to the strength estimate).

Table 1 Correlation matrix of the hate speech indicators, general hate speech impression, and a sum score of the indicators
Fig. 3
figure 3

Visualization of hate speech structure

Fig. 4
figure 4

Centrality measures of a regularized network of the hate speech indicators (ordered by strength). Note: The x-axis represents the standardized score for each variable (y-axis) in the index. A basic (simplified) description of the indices is presented in the body of the paper above (The structure of hate speech, point no. 8). For more detailed information on the strength, closeness, and betweenness please see the tutorial paper by Constantini et al. [77], for more information on expected influence please see [78]

The results are, to our knowledge, the first attempt to represent the structure of hate speech and as such, should be interpreted with caution. The attribute of promoting violent and aggressive behavior features in both legal and scientific definitions. This indicator seems to distinguish hate speech from distinct but related behaviors, such as dislike, lack of respect, or disapproval since, as Parekh [25] states, it “calls for action”. Our analysis shows that insulting and abusive language, previously considered an important but unnecessary component by scholars, seems to be another core indicator of hate speech. Since both indicators are easy to operationalize, they provide clearer guidance for the effective identification of hate speech. Some scholars [25, 27] consider inequality or the subordinate-superior relationship to be a defined component of hate speech. We suggest that this vaguely defined component manifests in specific types of speech as the third core indicator, denial of fundamental human rights. Denial of fundamental human rights places the person using hate speech in the position of a superior who determines who should and should not be considered a full-fledged human being. Again, focusing on the list of fundamental human rights and manifestations of their denial could facilitate the identification (in a public context) and measurement (in an academic context) of hate.

Practical implications

This paper presents a new approach to understanding hate speech online. Although there are automatic ways of detecting hate speech using advanced ML and NLP methods, still much could be done to aid social scientists interested in under-taking large-scale studies and in-depth research on hate speech. Our research into the investigation of hate speech indicators and their use in combating online hate speech incorporates a social science and computer science perspective, while offering some practical implications.

Having a set of quantifiable indicators could benefit researchers, human right activists, educators, analysts, and regulators by providing them with a pragmatic approach to hate speech assessment and detection. This pragmatic approach could benefit:

  1. 1.

    Scholars and researchers seeking to improve the way they communicate about the phenomenon of online hate speech. Without a clear concept and understanding of the concept, how can scholars and researchers know what speech should be targeted?

  2. 2.

    The non-profit sector, when conducting comparative international studies, measuring changes/assessments in the local community, or fostering debates, discussions, or the exchanging of ideas on how to protect disadvantaged client groups.

  3. 3.

    Regulators who need to identify the extent of hate speech on the local/national level, provide recommendations and mechanisms for fighting against and preventing hate speech, and for facilitating further efforts and initiatives.

  4. 4.

    Social media moderators and online portals who could use it as a basis for hate speech detection and moderation and cultivating discussion.

  5. 5.

    Legal proceedings. As Pejchal [55] stresses in a study analyzing legal instruments in post-communist Slovakia and the Czech Republic, there is a need to support and create a favorable environment for discussing democracy and the values it stands for—free speech versus hate speech.

While the proposed indicators may prove beneficial in the manual detection of hate speech in the above-mentioned scenarios, their greatest potential lies in automatic hate speech detection. First, many existing datasets (e.g., [13, 29, 56]) contain a label that a post is an instance of hate speech or offensive language, but often no finer-grained description or explanation is given. Although the annotators had definitions or guidelines to follow (see, e.g., [43]), these are not captured in the data. Thus, the same label in two different datasets may have a different meaning, which can hamper the generalizability of the trained models and prevent the use of datasets for transfer learning or even a simple comparison of the different models’ performances. Having a dataset annotated with the proposed indicators would alleviate these issues.

Moreover, the indicators can be used as input features or weak labels to train hate speech detection models. Such models would have the added benefit of explainability (e.g., the comment is labeled as hate speech because it promotes violent behavior and negatively stereotypes a minority), which is important for moderating social media content.

However, for the indicators to be used as weak labels or as explanations, they need to be automatically quantifiable from an input text (e.g., a social media post). In Table 2 we show how the proposed indicators occur in existing datasets or lexicons, which could be used to train models to automatically detect or quantify them. Given the available datasets and the current level of maturity of existing NLP approaches, the proposed indicators vary in readiness for automatic detection. Problematic hashtags, nicknames and symbols as well as slurs and vulgarisms can be detected with a high level of precision using simple word lexicons. However, the cultural sensitivity of certain word usages (words that are problematic in some cultures, but not in others) remains an open problem. Curated lexicons of problematic hashtags are still missing, although existing lists of hate speech wordsFootnote 5 could be used for this purpose. Language aimed at the denial of fundamental human rights or promoting violent behavior can be detected with machine learning methods based on sufficiently large datasets, since it is often characterized by a relatively homogenous set of features. While existing datasets often contain a label pertaining to violence and aggressiveness [57,58,59,60], the denial of fundamental human rights is rarely explicitly labeled, despite our analysis showing it is a strong indicator of hate speech.

Table 2 Map of proposed indicators to existing NLP datasets on hate speech and related phenomena, such as offensive and abusive language and the assessment of the current level of readiness for the automatic detection of each indicator given the availability of data and the capability of existing approaches

Sexist language or language attacking minorities as a “traditionally disadvantaged group” can also be detected with machine learning methods and there are datasets available containing these labels (e.g., [41, 43, 57, 61]). However, certain types of these are still hard to detect, e.g. latent sexist language, ambiguous statements or language drawing on a broader context and not just individual phrases or sentences. Stereotypical language can be detected with machine learning methods as well, but it is explicitly labeled in only a few datasets [59, 60, 62]. Another issue is the sheer amount of different stereotypes in different cultures.

Certain types of ad hominem attacks are relatively easy to detect, especially attacks in which other indicators feature, such as slurs and stereotypical language. The existing datasets that contain name calling or smears as a type of ad hominem reflect this [63, 64]. But there are context-sensitive ad hominem attacks whose detection requires a natural language understanding level not yet achieved by current systems. For instance, “He spends all his time in a library” might be an ad hominem attack aiming at a person’s practical skills, but in other contexts it could in fact be praise.

Despite the existing research, automatic irony or sarcasm detection or automatic detection of a manipulative style or misinterpretations is still far from a solved problem. Although there are labeled datasets on irony [65], humor [66], and manipulative (propaganda) techniques [63, 64], the current systems do not possess a level of natural language understanding that would enable them to reliably detect such phenomena. Yet our analysis shows that the use of ambiguous statements, irony or sarcasm is a peripheral indicator of hate speech.

Limitations of the present study and suggestions for future research

The present study has several limitations that can be divided into two main areas—robustness of procedure and technicalities related to the analysis. The list of indicators is probably not complete, and one can expect it to change as more ideas or empirical evidence come to light. This is not surprising given that even relatively well-established concepts in diagnostic manuals (e.g., PTSD, compare DSM-IV and the DSM-5) are continually evolving. It could be argued that the procedure employed here relied heavily on experts’ judgement (both in the selection of the indicators as well as their evaluation). Although expert judgement is essential at all stages of the scientific process [67], there is some evidence (e.g., [68]) that it may be less accurate than intuitively expected. To prevent systematic mistakes in the conceptualization of a phenomenon, it would perhaps be better to use a pragmatic data-driven approach (e.g., [69]) combined with an expert consensus. This could be a future area of research. Furthermore, the validity and hence generalizability of the set of indicators is likely to be dependent on the way the topic (e.g., LGBTI, gender, political issues) interacts with the cultural setting. To tackle this issue, replication studies on different topics in different cultures would be a very welcome addition. Applied to the present research, the results are probably limited by the specifics of Slovak culture/language and the polarizing topic of migration which may have distorted the ratings.

From the technical perspective, there are two major limitations to this study. First, the input number for calculating the network is taken from the averaged ratings of the hate speech indicators obtained from a fixed set of independent raters. Although this is a legitimate practice [70], a larger pool of expert raters would make the results more robust. Second, given the small sample (i.e., the number of comments), the stability of the observed estimates and the robustness of the whole network is moderate at best (for the results of simulations see https://osf.io/g3b87/). As noted above, the present network primarily serves as a probe into the topic and much more extensive research and replication is needed in order to obtain credible information about the structure of hate speech.

From the computer science perspective, the labeled dataset of hate speech comments is too small to be used in training automated detection models. However, this was beyond the scope of this work and a collection of a larger dataset is planned in the future. One limitation for future research and, more importantly, for practical deployment is that, as discussed above, not all of the proposed indicators are easily quantifiable (detected) by automatic means using ML and NLP methods (given the current state-of-the-art and available resources). However, the indicators, identified in our analysis as most significant for the operationalization of hate speech, can be detected with machine learning methods given large enough datasets, since they are characterized using a relatively homogeneous set of features. Lastly, one could object that our proposed indicators replace a simple binary classification of whether a comment is or is not hate speech with a set of classification/regression problems. But as argued above, the main advantage is that using indicators leads to unambiguous, interpretable, and comparable labels, which can then be used to detect and explain hate speech. Also, as shown above, most indicators can already be found in some form or other in the existing datasets, which points to their relevance and feasibility.

Conclusion

Hate speech is prevalent in all societies, across various social media platforms, and continues to present a challenge to practitioners and regulators. The latest report from the Organization for Security and Co-operation in Europe OSCE [71] includes hate crimes and bias-motivated crimes but fails to provide information about discrimination or hate speech owing to the lack of consensus on whether these acts should be criminalized. The more vague a theory is, the less it helps us describe, explain, predict, and control a phenomenon [72]. This applies to weak, narrative theories, which can be contrasted with strong theories that enable phenomena to be precisely formalized [72]. A good hate speech theory would be useful in providing us with a recipe for accurately identifying forms of hate speech. Our goal was to operationalize hate speech by proposing indicators and examining their structure. We defined hate speech in the context of migrants as any text that promotes violent behavior, denies human rights, contains slurs, vulgarisms, or ad hominem attacks, uses negative stereotypes, or purposefully manipulates the truth or historical facts. The indicators are interchangeable, and may not all be present, but we consider the promotion of violent behavior, use of vulgarisms, and denial of human rights to be the core aspects of hate speech. The proposed indicators can be directly used in practice (i.e., a more precise assessment of hate speech severity by rating the various indicators) and further developed by other researchers. The grey zone deserves special attention and, from the pragmatic perspective, would benefit from a more nuanced conceptualization of the hate speech phenomenon.

For future research, if we are to fully grasp the phenomenon, resources must be shared between researchers, and studies should be conducted/replicated across different cultures and topics, as there is a pressing need for more evidence-based information. In this regard, we plan to use the proposed methodology and indicators to collect a larger sample of labeled comments, of such a volume that it can also be used to train automatic hate speech detection models. To the best of our knowledge, our work provides the first (albeit small) public dataset of labeled hate speech comments in the Slovak language, and the extended dataset will be a significant contribution to the limited NLP resources available in Slovak (and other Slavic languages for that matter). Lastly, we plan to examine the methods for the automatic detection (or quantification) of the proposed indicators using the available resources and state-of-the-art NLP approaches as discussed in this paper.