当前位置: X-MOL 学术Data Min. Knowl. Discov. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Contextual bandits with hidden contexts: a focused data capture from social media streams
Data Mining and Knowledge Discovery ( IF 2.8 ) Pub Date : 2019-08-10 , DOI: 10.1007/s10618-019-00648-w
Sylvain Lamprier , Thibault Gisselbrecht , Patrick Gallinari

This paper addresses the problem of real time data capture from social media. Due to different limitations, it is not possible to collect all the data produced by social networks such as Twitter. Therefore, to be able to gather enough relevant information related to a predefined need, it is necessary to focus on a subset of the information sources. In this work, we focus on user-centered data capture and consider each account of a social network as a source that can be followed at each iteration of a data capture process. This process, whose aim is to maximize the cumulative utility of the captured information for the specified need, is constrained at each time step by the number of users that can be monitored simultaneously. The problem of selecting a subset of accounts to listen to over time is a sequential decision problem under constraints, which we formalize as a bandit problem with multiple selections. In this work, we propose a contextual UCB-like approach, that uses the activity of any user during the current step to predict his future behavior. Besides the capture of usefulness variations, considering contexts also enables to improve the efficiency of the process by leveraging some structure in the search space. However, existing contextual bandit approaches do not fit for our setting where most of the contexts are hidden from the agent. We therefore propose a new algorithm, called HiddenLinUCB, which aims at dealing with such missing information via variational inference. Experiments demonstrate the very good behavior of this approach compared to existing methods for tasks of data capture from social networks.

中文翻译:

具有隐藏上下文的上下文强盗:从社交媒体流中集中捕获数据

本文解决了从社交媒体实时捕获数据的问题。由于不同的限制,不可能收集社交网络(如Twitter)产生的所有数据。因此,为了能够收集与预定需求有关的足够的相关信息,有必要集中精力于信息源的子集。在这项工作中,我们专注于以用户为中心的数据捕获,并将社交网络的每个帐户都视为可在数据捕获过程的每次迭代中遵循的来源。该过程的目的是最大程度地满足特定需求,从而最大程度地利用已捕获信息的累积效用,但该过程在每个时间步均受到可同时监视的用户数量的限制。选择帐户子集以随时间推移收听的问题是在约束条件下的顺序决策问题,我们将其正式化为具有多种选择的强盗问题。在这项工作中,我们提出了一种类似于UCB的上下文方法,该方法使用当前步骤中任何用户的活动来预测其未来行为。除了捕获有用性变化之外,考虑上下文还可以通过利用搜索空间中的某些结构来提高处理效率。但是,现有的上下文强盗方法不适合我们的环境,在该环境中,大多数上下文对代理都是隐藏的。因此,我们提出了一种新算法,称为 考虑上下文还可以通过利用搜索空间中的某些结构来提高处理效率。但是,现有的上下文强盗方法不适合我们的环境,在该环境中,大多数上下文对代理都是隐藏的。因此,我们提出了一种新算法,称为 考虑上下文还可以通过利用搜索空间中的某些结构来提高处理效率。但是,现有的上下文强盗方法不适合我们的环境,在该环境中,大多数上下文对代理都是隐藏的。因此,我们提出了一种新算法,称为HiddenLinUCB,旨在通过变分推理来处理此类丢失的信息。实验证明,与现有方法相比,该方法具有很好的行为,可用于从社交网络捕获数据。
更新日期:2019-08-10
down
wechat
bug