当前位置: X-MOL 学术Inf. Process. Manag. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Convolutional neural encoding of online reviews for the identification of travel group type topics on TripAdvisor
Information Processing & Management ( IF 7.4 ) Pub Date : 2021-06-03 , DOI: 10.1016/j.ipm.2021.102645
Francisco Jose Arenas-Márquez , Rocio Martinez-Torres , Sergio Toral

Previous studies have concluded that there are significant differences in travelers’ preferences depending on the trip type. The problem of extracting users’ preferences from a corpus of text can be solved by using traditional clustering algorithms, which work quite well when there is no predefined data structure. However, in this paper, we consider the problem of extracting users’ preferences when they belong to a finite number of classes represented by the trip type. In this paper, we propose an encoding method based on a Convolutional Neural Networks (CNNs), trained as a classifier for the classes that predefine data structure. The intuition behind convolutional neural encoding is its ability to maximize the distance between documents belonging to different classes in the new, derived feature space. Findings reveal that CNNs encoding has better discriminative properties than alternative encoding methods such as Latent Dirichlet Allocation or average word2vec encoding. Moreover, we demonstrate that CNNs encoding can be used to identify the unique topics associated with the predefined data structure determined, in this case, by the four trip types.



中文翻译:

在线评论的卷积神经编码,用于识别 TripAdvisor 上的旅行团类型主题

先前的研究得出结论,旅行者的偏好因旅行类型而异。从文本语料库中提取用户偏好的问题可以通过使用传统的聚类算法来解决,当没有预定义的数据结构时,这种算法效果很好。然而,在本文中,我们考虑了当用户属于由旅行类型表示的有限数量的类时提取用户偏好的问题。在本文中,我们提出了一种基于卷积神经网络 (CNN) 的编码方法,该方法被训练为预定义数据结构的类的分类器。卷积神经编码背后的直觉是它能够在新的派生特征空间中最大化属于不同类别的文档之间的距离。研究结果表明,CNNs 编码比潜在狄利克雷分配或平均 word2vec 编码等替代编码方法具有更好的判别性。此外,我们证明了 CNN 编码可用于识别与预定义数据结构相关的独特主题,在这种情况下,由四种旅行类型确定。

更新日期:2021-06-03
down
wechat
bug