Revisiting random walk based sampling in networks: evasion of burn-in period and frequent regenerations.,Computational Social Networks

当前位置： X-MOL 学术 › Comput. Soc. Netw. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Revisiting random walk based sampling in networks: evasion of burn-in period and frequent regenerations.
Computational Social Networks Pub Date : 2018-03-19 , DOI: 10.1186/s40649-018-0051-0
Konstantin Avrachenkov ₁ , Vivek S Borkar ₂ , Arun Kadavankandy ₃ , Jithin K Sreedharan ₄

Affiliation

In the framework of network sampling, random walk (RW) based estimation techniques provide many pragmatic solutions while uncovering the unknown network as little as possible. Despite several theoretical advances in this area, RW based sampling techniques usually make a strong assumption that the samples are in stationary regime, and hence are impelled to leave out the samples collected during the burn-in period. This work proposes two sampling schemes without burn-in time constraint to estimate the average of an arbitrary function defined on the network nodes, for example, the average age of users in a social network. The central idea of the algorithms lies in exploiting regeneration of RWs at revisits to an aggregated super-node or to a set of nodes, and in strategies to enhance the frequency of such regenerations either by contracting the graph or by making the hitting set larger. Our first algorithm, which is based on reinforcement learning (RL), uses stochastic approximation to derive an estimator. This method can be seen as intermediate between purely stochastic Markov chain Monte Carlo iterations and deterministic relative value iterations. The second algorithm, which we call the Ratio with Tours (RT)-estimator, is a modified form of respondent-driven sampling (RDS) that accommodates the idea of regeneration. We study the methods via simulations on real networks. We observe that the trajectories of RL-estimator are much more stable than those of standard random walk based estimation procedures, and its error performance is comparable to that of respondent-driven sampling (RDS) which has a smaller asymptotic variance than many other estimators. Simulation studies also show that the mean squared error of RT-estimator decays much faster than that of RDS with time. The newly developed RW based estimators (RL- and RT-estimators) allow to avoid burn-in period, provide better control of stability along the sample path, and overall reduce the estimation time. Our estimators can be applied in social and complex networks.

中文翻译：

重新审视网络中基于随机游走的采样：规避老化期和频繁再生。

在网络采样的框架中，基于随机游走 (RW) 的估计技术提供了许多实用的解决方案，同时尽可能少地发现未知网络。尽管在这一领域取得了一些理论进步，但基于 RW 的采样技术通常强烈假设样品处于静止状态，因此被迫忽略在老化期间收集的样品。这项工作提出了两种没有老化时间限制的采样方案来估计网络节点上定义的任意函数的平均值，例如社交网络中用户的平均年龄。算法的中心思想在于在重新访问聚合的超级节点或一组节点时利用 RW 的再生，以及通过收缩图形或使命中集更大来提高此类再生频率的策略。我们的第一个算法基于强化学习 (RL)，使用随机逼近来推导估计量。这种方法可以看作是纯随机马尔可夫链蒙特卡罗迭代和确定性相对值迭代之间的中间体。第二种算法，我们称之为 Ratio with Tours (RT)-estimator，是响应者驱动抽样 (RDS) 的一种修改形式，它适应了再生的想法。我们通过在真实网络上的模拟来研究这些方法。我们观察到 RL 估计器的轨迹比基于标准随机游走的估计程序的轨迹稳定得多，它的误差性能与响应者驱动抽样 (RDS) 的误差性能相当，RDS 的渐近方差小于许多其他估计量。仿真研究还表明，RT-estimator 的均方误差随时间衰减的速度比 RDS 快得多。新开发的基于 RW 的估计器（RL 和 RT 估计器）可以避免老化期，更好地控制采样路径的稳定性，并总体上减少估计时间。我们的估计器可以应用于社交网络和复杂网络。并整体减少估计时间。我们的估计器可以应用于社交网络和复杂网络。并整体减少估计时间。我们的估计器可以应用于社交网络和复杂网络。

更新日期：2018-03-19

点击分享查看原文

点击收藏

公开下载

阅读更多本刊最新论文

全部期刊列表>>