当前位置: X-MOL 学术arXiv.cs.DB › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Looking for COVID-19 misinformation in multilingual social media texts
arXiv - CS - Databases Pub Date : 2021-05-03 , DOI: arxiv-2105.03313
Raj Ratn Pranesh, Mehrdad Farokhnejad, Ambesh Shekhar, Genoveva Vargas-Solar

This paper presents the Multilingual COVID-19 Analysis Method (CMTA) for detecting and observing the spread of misinformation about this disease within texts. CMTA proposes a data science (DS) pipeline that applies machine learning models for processing, classifying (Dense-CNN) and analyzing (MBERT) multilingual (micro)-texts. DS pipeline data preparation tasks extract features from multilingual textual data and categorize it into specific information classes (i.e., 'false', 'partly false', 'misleading'). The CMTA pipeline has been experimented with multilingual micro-texts (tweets), showing misinformation spread across different languages. To assess the performance of CMTA and put it in perspective, we performed a comparative analysis of CMTA with eight monolingual models used for detecting misinformation. The comparison shows that CMTA has surpassed various monolingual models and suggests that it can be used as a general method for detecting misinformation in multilingual micro-texts. CMTA experimental results show misinformation trends about COVID-19 in different languages during the first pandemic months.

中文翻译:

在多语言社交媒体文本中寻找COVID-19错误信息

本文提出了一种多语言COVID-19分析方法(CMTA),用于检测和观察有关该疾病的错误信息在教科书中的传播。CMTA提出了一条数据科学(DS)管道,该管道将机器学习模型应用于处理,分类(Dense-CNN)和分析(MBERT)多语言(微)文本。DS管道数据准备任务从多语言文本数据中提取功能,并将其分类为特定的信息类(即“假”,“部分假”,“误导”)。CMTA管道已使用多语言微文本(推文)进行了试验,显示了错误信息分布在不同语言中。为了评估CMTA的性能并将其放在正确的位置,我们使用八个用于检测错误信息的单语言模型对CMTA进行了比较分析。比较表明,CMTA已经超越了各种单语言模型,并表明它可以用作检测多语言微文本中错误信息的通用方法。CMTA实验结果显示,在大流行的最初几个月中,不同语言的COVID-19信息错误趋势。
更新日期:2021-05-10
down
wechat
bug