当前位置: X-MOL 学术arXiv.cs.CY › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Voice for the Voiceless: Active Sampling to Detect Comments Supporting the Rohingyas
arXiv - CS - Computers and Society Pub Date : 2019-10-08 , DOI: arxiv-1910.03206
Shriphani Palakodety, Ashiqur R. KhudaBukhsh, Jaime G. Carbonell

The Rohingya refugee crisis is one of the biggest humanitarian crises of modern times with more than 600,000 Rohingyas rendered homeless according to the United Nations High Commissioner for Refugees. While it has received sustained press attention globally, no comprehensive research has been performed on social media pertaining to this large evolving crisis. In this work, we construct a substantial corpus of YouTube video comments (263,482 comments from 113,250 users in 5,153 relevant videos) with an aim to analyze the possible role of AI in helping a marginalized community. Using a novel combination of multiple Active Learning strategies and a novel active sampling strategy based on nearest-neighbors in the comment-embedding space, we construct a classifier that can detect comments defending the Rohingyas among larger numbers of disparaging and neutral ones. We advocate that beyond the burgeoning field of hate-speech detection, automatic detection of \emph{help-speech} can lend voice to the voiceless people and make the internet safer for marginalized communities.



联合国难民事务高级专员称,罗兴亚难民危机是现代最大的人道主义危机之一,超过 60 万罗兴亚人无家可归。虽然它在全球范围内受到媒体的持续关注,但尚未对社交媒体进行与这场不断演变的大型危机有关的全面研究。在这项工作中,我们构建了大量 YouTube 视频评论(来自 5,153 个相关视频中的 113,250 位用户的 263,482 条评论),旨在分析人工智能在帮助边缘化社区方面的可能作用。在评论嵌入空间中使用多种主动学习策略和基于最近邻的新主动采样策略的新颖组合,我们构建了一个分类器,可以在大量贬低和中立的评论中检测为罗兴亚人辩护的评论。我们主张,除了新兴的仇恨言论检测领域之外,\emph{help-speech} 的自动检测可以为无声的人发声,并使边缘化社区的互联网更安全。