Integrating learned and explicit document features for reputation monitoring in social media,Knowledge and Information Systems

当前位置： X-MOL 学术 › Knowl. Inf. Syst. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Integrating learned and explicit document features for reputation monitoring in social media
Knowledge and Information Systems ( IF 2.7 ) Pub Date : 2019-07-19 , DOI: 10.1007/s10115-019-01383-w
Fernando Giner , Enrique Amigó , Felisa Verdejo

Currently, monitoring reputation in social media is probably one of the most lucrative applications of information retrieval methods. However, this task poses new challenges due to the dynamicity of contents and the need for early detection of topics that affect the reputations of companies. Addressing this problem with learning mechanisms that are based on training data sets is challenging, given that unseen features play a crucial role. However, learning processes are necessary to capture domain features and dependency phenomena. In this work, based on observational information theory, we define a document representation framework that enables the combination of explicit text features and supervised and unsupervised signals into a single representation model. Our theoretical analysis demonstrates that the observation information quantity (OIQ) generalizes the most popular representation methods, in addition to capturing quantitative values, which is required for integrating signals from learning processes. In other words, the OIQ allows us to give the same treatment to features that are currently managed separately. Empirically, our experiments on the reputation-monitoring scenario demonstrated that adding features progressively from supervised (in particular, Bayesian inference over annotated data) and unsupervised learning methods (in particular, proximity to clusters) increases the similarity estimation performance. This result is verified under various similarity criteria (pointwise mutual information, Jaccard and Lin’s distances and the information contrast model). According to our formal analysis, the OIQ is the first representation model that captures the informativeness (specificity) of quantitative features in the document representation.

中文翻译：

集成已学习的文档和显式文档功能以在社交媒体中进行声誉监控

当前，监视社交媒体中的声誉可能是信息检索方法最有利可图的应用之一。但是，由于内容的动态性以及需要及早发现影响公司声誉的主题，因此这项任务提出了新的挑战。考虑到看不见的特征起着至关重要的作用，利用基于训练数据集的学习机制来解决这个问题具有挑战性。但是，学习过程对于捕获领域特征和依赖现象是必需的。在这项工作中，基于观测信息理论，我们定义了一个文档表示框架，该框架使显式文本功能以及有监督和无监督的信号组合成一个单一的表示模型。我们的理论分析表明，观测信息量（OIQ）除了捕获量化值外，还概括了最流行的表示方法，这是整合学习过程中的信号所必需的。换句话说，OIQ允许我们对当前单独管理的功能进行相同的处理。根据经验，我们在信誉监控方案上进行的实验表明，通过监督（尤其是对带注释的数据的贝叶斯推断）和无监督学习方法（尤其是与群集的接近性）逐渐添加功能可以提高相似度估计性能。在各种相似性标准（逐点互信息，Jaccard和Lin的距离以及信息对比模型）下验证了此结果。

更新日期：2019-07-19

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南

全部期刊列表>>