当前位置: X-MOL 学术Comput. Biol. Med. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Utilizing a multi-class classification approach to detect therapeutic and recreational misuse of opioids on Twitter
Computers in Biology and Medicine ( IF 7.0 ) Pub Date : 2020-11-20 , DOI: 10.1016/j.compbiomed.2020.104132
Samah Jamal Fodeh 1 , Mohammed Al-Garadi 2 , Osama Elsankary 3 , Jeanmarie Perrone 4 , William Becker 5 , Abeed Sarker 2
Affiliation  

Background

Opioid misuse (OM) is a major health problem in the United States, and can lead to addiction and fatal overdose. We sought to employ natural language processing (NLP) and machine learning to categorize Twitter chatter based on the motive of OM.

Materials and methods

We collected data from Twitter using opioid-related keywords, and manually annotated 6988 tweets into three classes—No-OM, Pain-related-OM, and Recreational-OM—with the No-OM class representing tweets indicating no use/misuse, and the Pain-related misuse and Recreational-misuse classes representing misuse for pain or recreation/addiction. We trained and evaluated multi-class classifiers, and performed term-level k-means clustering to assess whether there were terms closely associated with the three classes.

Results

On a held-out test set of 1677 tweets, a transformer-based classifier (XLNet) achieved the best performance with F1-score of 0.71 for the Pain-misuse class, and 0.79 for the Recreational-misuse class. Macro- and micro-averaged F1-scores over all classes were 0.82 and 0.92, respectively. Content-analysis using clustering revealed distinct clusters of terms associated with each class.

Discussion

While some past studies have attempted to automatically detect opioid misuse, none have further characterized the motive for misuse. Our multi-class classification approach using XLNet showed promising performance, including in detecting the subtle differences between pain-related and recreation-related misuse. The distinct clustering of class-specific keywords may help conduct targeted data collection, overcoming under-representation of minority classes.

Conclusion

Machine learning can help identify pain-related and recreational-related OM contents on Twitter to potentially enable the study of the characteristics of individuals exhibiting such behavior.



中文翻译:

利用多类分类方法检测 Twitter 上阿片类药物的治疗性和娱乐性滥用

背景

阿片类药物滥用 (OM) 在美国是一个主要的健康问题,可能导致成瘾和致命的过量服用。我们试图采用自然语言处理 (NLP) 和机器学习来根据 OM 的动机对 Twitter 聊天进行分类。

材料和方法

我们使用与阿片类药物相关的关键字从 Twitter 收集数据,并将 6988 条推文手动注释为三个类别——No-OM、Pain-related-OM 和 Recreational-OM——其中 No-OM 类别表示表示没有使用/滥用的推文,以及与疼痛相关的误用和娱乐性误用类别,代表对疼痛或娱乐/成瘾的误用。我们训练和评估了多类分类器,并进行了术语级别的 k-means 聚类,以评估是否存在与这三个类密切相关的术语。

结果

在 1677 条推文的保留测试集上,基于转换器的分类器 (XLNet) 取得了最佳性能,疼痛滥用类的 F 1得分为 0.71,娱乐性滥用类为 0.79。所有班级的宏观和微观平均 F 1分数分别为 0.82 和 0.92。使用聚类的内容分析揭示了与每个类别相关的不同术语集群。

讨论

虽然过去的一些研究试图自动检测阿片类药物的滥用,但没有进一步描述滥用的动机。我们使用 XLNet 的多类分类方法显示出良好的性能,包括检测与疼痛相关和与娱乐相关的滥用之间的细微差异。特定类别关键字的独特聚类可能有助于进行有针对性的数据收集,克服少数类别的代表性不足。

结论

机器学习可以帮助识别 Twitter 上与疼痛相关和娱乐相关的 OM 内容,从而有可能研究表现出此类行为的个体的特征。

更新日期:2020-12-06
down
wechat
bug