当前位置: X-MOL 学术Artif. Intell. Rev. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Multi-label classification and knowledge extraction from oncology-related content on online social networks
Artificial Intelligence Review ( IF 10.7 ) Pub Date : 2020-04-17 , DOI: 10.1007/s10462-020-09839-0
Mahdi Hashemi , Margeret Hall

This study aims at automatic processing and knowledge extraction from large amounts of oncology-related content from online social networks (OSN). In this context, a large number of OSN textual posts concerning major cancer types are automatically scraped and structured using natural language processing techniques. Machines are trained to assign multiple labels to these posts based on the type of knowledge enclosed, if any. Trained machines are used to automatically classify large-scale textual posts. Statistical inferences are made based on these predictions to extract general concepts and abstract knowledge. Different approaches for constructing document feature vectors showed no tangible effect on the classification accuracy. Among different classifiers, logistic regression achieved the highest overall accuracy (96.4%) and $$\overline{F1}$$ F 1 ¯ (73.4) in a 13-way multi-label classification of textual posts. The most common topic was seeking or providing moral support for cancer patients, followed by providing technical information about cancer causes and treatments. The most common causes and treatments of different types of cancer on OSN are also automatically detected in this study. Seeking or providing moral support for cancer patients shared the largest overlap with other topics, i.e. moral support tends to be present even in OSN posts which focus on other topics. On the other hand, providing technical information about cancer diagnosis or prevention were the most isolated topics, where OSN posts tend not to allude to other topics. OSN posts which seek financial support only overlap with the moral support topic, if any. Our methodology and results provide public health professionals with an opportunity to monitor what topics and to which extent are being discussed on OSN, what specific information and knowledge are being disseminated over OSN, and to assess their veracity in close to real time. This helps them to develop policies that encourage, discourage, or modify the consumption of viral oncology-related information on OSN.

中文翻译:

在线社交网络上肿瘤相关内容的多标签分类和知识提取

本研究旨在从在线社交网络 (OSN) 的大量肿瘤学相关内容中进行自动处理和知识提取。在这种情况下,大量关于主要癌症类型的 OSN 文本帖子是使用自然语言处理技术自动抓取和结构化的。机器经过训练,可以根据所包含的知识类型(如果有)为这些帖子分配多个标签。训练有素的机器用于自动分类大规模文本帖子。根据这些预测进行统计推断,提取一般概念和抽象知识。构建文档特征向量的不同方法对分类精度没有明显影响。在不同的分类器中,逻辑回归实现了最高的总体准确率(96. 4%) 和 $$\overline{F1}$$ F 1¯ (73.4) 在文本帖子的 13 向多标签分类中。最常见的话题是为癌症患者寻求或提供道义支持,其次是提供有关癌症原因和治疗的技术信息。本研究还自动检测了 OSN 上不同类型癌症的最常见原因和治疗方法。为癌症患者寻求或提供精神支持与其他主题有最大的重叠,即,即使在关注其他主题的 OSN 帖子中也往往存在精神支持。另一方面,提供有关癌症诊断或预防的技术信息是最孤立的主题,OSN 帖子往往不会提及其他主题。寻求财务支持的 OSN 帖子仅与道德支持主题重叠(如果有)。我们的方法和结果为公共卫生专业人员提供了一个机会,可以监控 OSN 上正在讨论的主题和程度,通过 OSN 传播的具体信息和知识,并近乎实时地评估其真实性。这有助于他们制定政策,鼓励、阻止或修改 OSN 上病毒肿瘤学相关信息的消费。
更新日期:2020-04-17
down
wechat
bug