Drink2Vec: Improving the classification of alcohol-related tweets using distributional semantics and external contextual enrichment,Information Processing & Management

当前位置： X-MOL 学术 › Inf. Process. Manag. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Drink2Vec: Improving the classification of alcohol-related tweets using distributional semantics and external contextual enrichment
Information Processing & Management ( IF 7.4 ) Pub Date : 2020-09-02 , DOI: 10.1016/j.ipm.2020.102369
Marcos Grzeça , Karin Becker , Renata Galante

The hazardous and harmful use of alcohol has become a public health issue worldwide. Social media has emerged as a reliable source to extract information on alcohol consumption at low cost and latency. The automatic classification of tweets related to alcohol consumption can help to understand the factors related to alcohol abuse. In this paper, we propose Drink2Vec, a method aimed at improving the classification of alcohol-related tweets by exploring two forms of contextual information: distributional semantics and external contextual enrichment. The core of Drink2Vec is a convolutional neural network that learns domain-specific word embedding representations that capture vocabulary related to alcohol consumption. Drink2Vec builds on Drink2Symbol, a method that finds relevant symbolic features on external sources (e.g., Semantic Web) to provide meaning and generalization to the terms present in tweets. Based on five datasets and three classification algorithms, our experiments show that external enrichment improves the recall by the addition of generalization features, while distributional semantics improves the precision mainly by characterizing terms according to their usage. A stacking ensemble of these classifiers establishes a proper balance between the advantages of each contextual enrichment technique. Our experiments also suggest that the task-specific embeddings produced by Drink2Vec capture more nuances of the informal vocabulary related to alcohol consumption (e.g., slangs, events, misspelled words) and yield better results compared to other strategies (e.g., pre-trained embeddings and generic algorithms).

中文翻译：

Drink2Vec：使用分布语义和外部上下文丰富化来改善与酒精相关的推文的分类

酒精的有害和有害使用已成为世界范围内的公共卫生问题。社交媒体已成为一种可靠的资源，可以低成本和低延迟地提取酒精消费方面的信息。与饮酒有关的推文的自动分类可以帮助您了解与饮酒有关的因素。在本文中，我们提出了Drink2Vec，该方法旨在通过探索两种形式的上下文信息来改善酒精相关推文的分类：分布语义和外部上下文丰富。Drink2Vec的核心是一个卷积神经网络，它学习特定领域的词嵌入表示法，以捕获与饮酒有关的词汇。Drink2Vec建立在Drink2Symbol的基础上，Drink2Symbol是一种在外部资源（例如，语义网）上找到相关符号特征的方法，可为推文中的术语提供含义和概括。基于五个数据集和三个分类算法，我们的实验表明，外部充实通过添加泛化特征来提高召回率，而分布语义主要通过根据术语的使用来表征其特征，从而提高了查准率。这些分类器的堆叠集成在每种上下文丰富技术的优点之间建立了适当的平衡。我们的实验还表明，Drink2Vec产生的特定于任务的嵌入与其他策略（例如预训练的嵌入和通用算法）相比，可以捕获与饮酒有关的非正式词汇的更多细微差别（例如语，事件，拼写错误的单词），并产生更好的结果。

更新日期：2020-09-02

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南11