当前位置: X-MOL 学术Appl. Netw. Sci. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Selective network discovery via deep reinforcement learning on embedded spaces
Applied Network Science Pub Date : 2021-03-20 , DOI: 10.1007/s41109-021-00365-8
Peter Morales , Rajmonda Sulo Caceres , Tina Eliassi-Rad

Complex networks are often either too large for full exploration, partially accessible, or partially observed. Downstream learning tasks on these incomplete networks can produce low quality results. In addition, reducing the incompleteness of the network can be costly and nontrivial. As a result, network discovery algorithms optimized for specific downstream learning tasks given resource collection constraints are of great interest. In this paper, we formulate the task-specific network discovery problem as a sequential decision-making problem. Our downstream task is selective harvesting, the optimal collection of vertices with a particular attribute. We propose a framework, called network actor critic (NAC), which learns a policy and notion of future reward in an offline setting via a deep reinforcement learning algorithm. The NAC paradigm utilizes a task-specific network embedding to reduce the state space complexity. A detailed comparative analysis of popular network embeddings is presented with respect to their role in supporting offline planning. Furthermore, a quantitative study is presented on various synthetic and real benchmarks using NAC and several baselines. We show that offline models of reward and network discovery policies lead to significantly improved performance when compared to competitive online discovery algorithms. Finally, we outline learning regimes where planning is critical in addressing sparse and changing reward signals.



中文翻译:

通过对嵌入式空间的深度强化学习来进行选择性网络发现

复杂的网络通常太大,无法进行全面探索,部分可访问或部分可观察。这些不完整的网络上的下游学习任务可能会产生低质量的结果。另外,减少网络的不完整性可能是昂贵且不平凡的。结果,在给定资源收集约束的情况下针对特定下游学习任务进行了优化的网络发现算法引起了人们的极大兴趣。在本文中,我们将任务特定的网络发现问题表述为顺序决策问题。我们的下游任务是选择性收获,即具有特定属性的顶点的最佳集合。我们提出了一个称为网络演员评论家(NAC)的框架,该框架通过深度强化学习算法在离线环境中学习政策和未来奖励的概念。NAC范例利用特定于任务的网络嵌入来减少状态空间的复杂性。针对流行的网络嵌入在支持离线计划中的作用,进行了详细的比较分析。此外,使用NAC和几个基准对各种综合基准和实际基准进行了定量研究。我们显示,与竞争性在线发现算法相比,奖励和网络发现策略的离线模型可显着提高性能。最后,我们概述了学习制度,其中计划对于解决稀疏和变化的奖励信号至关重要。使用NAC和几个基准对各种综合基准和实际基准进行了定量研究。我们显示,与竞争性在线发现算法相比,奖励和网络发现策略的离线模型可显着提高性能。最后,我们概述了学习制度,其中计划对于解决稀疏和变化的奖励信号至关重要。使用NAC和几个基准对各种综合基准和实际基准进行了定量研究。我们显示,与竞争性在线发现算法相比,奖励和网络发现策略的离线模型可显着提高性能。最后,我们概述了学习制度,其中计划对于解决稀疏和变化的奖励信号至关重要。

更新日期:2021-03-21
down
wechat
bug