当前位置: X-MOL 学术Comput. Sci. Rev. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Machine learning techniques for hate speech classification of twitter data: State-of-the-art, future challenges and research directions
Computer Science Review ( IF 12.9 ) Pub Date : 2020-10-13 , DOI: 10.1016/j.cosrev.2020.100311
Femi Emmanuel Ayo , Olusegun Folorunso , Friday Thomas Ibharalu , Idowu Ademola Osinuga

Twitter is a microblogging tool that allow the creation of big data through short digital contents. This study provides a survey of machine learning techniques for hate speech classification from Twitter data streams. Hate speech classification in Twitter data streams has remain a vibrant research focus, but little research efforts have been devoted to the design of a generic metadata architecture, threshold settings and fragmentation issues. Hate speech classification techniques presented in literature address some of the challenges inherent in Twitter data streams but limited in the aforementioned issues. This study presented collection of hate speech benchmarks datasets suitable for testing the efficiency of classification models. This study also presented the pros and cons for single and hybrid machine learning methods in hate speech classification. The summary of performance evaluation for the surveyed machine learning methods was also presented. The study also presented a generic metadata architecture for hate speech classification in Twitter to tackle issues with Twitter data streams. The developed generic metadata architecture was observed to performed better across all evaluation metrics for hate speech detection having 0.95, 0.93, 0.92 and 0.93 for accuracy, precision, recall and F1-score respectively, when compared to similar methods. Similarly, the developed generic metadata architecture for hate speech sentiment classification performed better with F1-score of 91.5% compared to related methods. The developed generic metadata architecture also indicates a more perfect test having an AUC of 0.97, when compared to similar methods. The statistical validation of results points out the efficiency of the developed system. Finally, the results also showed that the developed system is very good for automatic topic detection and categorization.



中文翻译:

Twitter数据仇恨语音分类的机器学习技术:最新技术,未来挑战和研究方向

Twitter是一种微博客工具,允许通过简短的数字内容创建大数据。这项研究对Twitter数据流中仇恨语音分类的机器学习技术进行了调查。Twitter数据流中的仇恨语音分类仍然是一个充满活力的研究重点,但是很少有研究工作致力于通用元数据体系结构,阈值设置和碎片问题的设计。文献中提出的仇恨语音分类技术解决了Twitter数据流中固有的一些挑战,但仅限于上述问题。这项研究提出了适合测试分类模型效率的仇恨语音基准数据集。这项研究还介绍了仇恨语音分类中单机和混合机学习方法的优缺点。还概述了所调查的机器学习方法的性能评估。该研究还提出了用于Twitter中仇恨语音分类的通用元数据体系结构,以解决Twitter数据流中的问题。与类似方法相比,观察到开发的通用元数据体系结构在仇恨语音检测的所有评估指标上均表现更好,其准确性,准确性,召回率和F1得分分别为0.95、0.93、0.92和0.93。同样,与相关方法相比,用于仇恨语音情感分类的已开发通用元数据体系结构的F1得分为91.5%,效果更好。与类似方法相比,开发的通用元数据体系结构还表明AUC为0.97的测试更加完美。结果的统计验证指出了开发系统的效率。最后,结果还表明,所开发的系统非常适合自动主题检测和分类。

更新日期:2020-10-13
down
wechat
bug