Classifying Hate Speech Using a Two-Layer Model,Statistics and Public Policy

当前位置： X-MOL 学术 › Statistics and Public Policy › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Classifying Hate Speech Using a Two-Layer Model
Statistics and Public Policy ( IF 1.5 ) Pub Date : 2019-01-01 , DOI: 10.1080/2330443x.2019.1660285
Yiwen Tang ₁ , Nicole Dalzell ₁

Affiliation

ABSTRACT Social media and other online sites are being increasingly scrutinized as platforms for cyberbullying and hate speech. Many machine learning algorithms, such as support vector machines, have been adopted to create classification tools to identify and potentially filter patterns of negative speech. While effective for prediction, these methodologies yield models that are difficult to interpret. In addition, many studies focus on classifying comments as either negative or neutral, rather than further separating negative comments into subcategories. To address both of these concerns, we introduce a two-stage model for classifying text. With this model, we illustrate the use of internal lexicons, collections of words generated from a pre-classified training dataset of comments that are specific to several subcategories of negative comments. In the first stage, a machine learning algorithm classifies each comment as negative or neutral, or more generally target or nontarget. The second stage of model building leverages the internal lexicons (called L2CLs) to create features specific to each subcategory. These features, along with others, are then used in a random forest model to classify the comments into the subcategories of interest. We demonstrate our approach using two sets of data. Supplementary materials for this article are available online.

中文翻译：

使用两层模型对仇恨言论进行分类

摘要社交媒体和其他在线站点越来越受到网络欺凌和仇恨言论平台的审查。已经采用了许多机器学习算法（例如支持向量机）来创建分类工具，以识别并可能过滤否定语音的模式。这些方法虽然对预测有效，但产生的模型很难解释。另外，许多研究集中于将评论分为否定评论或中性评论，而不是进一步将否定评论分为子类别。为了解决这两个问题，我们引入了一个两阶段的文本分类模型。使用该模型，我们说明了内部词典的使用，这些词典是从预先分类的评论训练数据集中生成的单词集合，这些特定于否定评论的几个子类别。在第一阶段，机器学习算法将每个评论分类为否定或中立，或更普遍地将目标或非目标归类。模型构建的第二阶段利用内部词典（称为L2CL）来创建特定于每个子类别的功能。然后，将这些功能以及其他功能一起用于随机森林模型中，以将注释分类为感兴趣的子类别。我们使用两组数据演示了我们的方法。可在线获得本文的补充材料。然后在随机森林模型中使用来将注释分类为感兴趣的子类别。我们使用两组数据演示了我们的方法。可在线获得本文的补充材料。然后在随机森林模型中使用来将注释分类为感兴趣的子类别。我们使用两组数据演示了我们的方法。可在线获得本文的补充材料。

更新日期：2019-01-01

点击分享查看原文

点击收藏

公开下载

阅读更多本刊最新论文