当前位置: X-MOL 学术IEEE Trans. Signal Process. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Distributed Constrained Online Learning
IEEE Transactions on Signal Processing ( IF 4.6 ) Pub Date : 2020-01-01 , DOI: 10.1109/tsp.2020.2999671
Santiago Paternain , Soomin Lee , Michael M. Zavlanos , Alejandro Ribeiro

In this article, we consider groups of agents in a network that select actions in order to satisfy a set of constraints that vary arbitrarily over time and minimize a time varying function of which they have only local observations. The selection of actions, also called a strategy, is causal and decentralized, i.e., the dynamical system that determines the actions of a given agent depends only on the constraints at the current time and on its own actions and those of its neighbors. To determine such a strategy, we propose a decentralized saddle point algorithm and show that the corresponding global fit and regret are bounded by functions of the order of $\sqrt{T}$ i.e., functions whose limit is a constant when divided by $\sqrt{T}$. Specifically, we define the global fit of a strategy as a vector that integrates over time the global constraint violations as seen by a given node. The fit is a performance loss associated with online operation as opposed to offline clairvoyant operation which can always select an action, if one exists, that satisfies the constraints at all times. If this fit grows sublinearly with the time horizon it suggests that the strategy approaches the feasible set of actions. Likewise, we define the regret of a strategy as the difference between its accumulated cost and that of the best fixed action that one could select knowing beforehand the time evolution of the objective function. Numerical examples support the theoretical conclusions.

中文翻译:

分布式约束在线学习

在本文中,我们考虑网络中的代理组,它们选择动作以满足一组随时间任意变化的约束,并最小化它们只有局部观察的时变函数。动作的选择,也称为策略,是因果和分散的,即决定给定代理动作的动态系统仅取决于当前时间的约束以及它自己及其邻居的动作。为了确定这样的策略,我们提出了一种分散的鞍点算法,并表明相应的全局拟合和遗憾受 $\sqrt{T}$ 阶函数的限制,即当除以 $\ 时其极限为常数的函数sqrt{T}$. 具体来说,我们将策略的全局拟合定义为一个向量,该向量随时间对给定节点所见的全局约束违规进行积分。拟合是与在线操作相关的性能损失,而不是离线千里眼操作,离线千里眼操作始终可以选择一个动作,如果存在,则始终满足约束。如果这种拟合随时间范围呈亚线性增长,则表明该策略接近可行的操作集。同样,我们将策略的遗憾定义为其累积成本与事先知道目标函数的时间演化可以选择的最佳固定行动的成本之间的差异。数值例子支持理论结论。拟合是与在线操作相关的性能损失,而不是离线千里眼操作,离线千里眼操作始终可以选择一个动作,如果存在,则始终满足约束。如果这种拟合随着时间范围呈亚线性增长,则表明该策略接近可行的操作集。同样,我们将策略的遗憾定义为其累积成本与事先知道目标函数的时间演化可以选择的最佳固定行动的成本之间的差异。数值例子支持理论结论。拟合是与在线操作相关的性能损失,而不是离线千里眼操作,离线千里眼操作始终可以选择一个动作,如果存在,则始终满足约束。如果这种拟合随着时间范围呈亚线性增长,则表明该策略接近可行的操作集。同样,我们将策略的遗憾定义为其累积成本与事先知道目标函数的时间演化可以选择的最佳固定行动的成本之间的差异。数值例子支持理论结论。我们将策略的遗憾定义为它的累积成本与事先知道目标函数的时间演化可以选择的最佳固定动作之间的差异。数值例子支持理论结论。我们将策略的遗憾定义为它的累积成本与事先知道目标函数的时间演化可以选择的最佳固定动作之间的差异。数值例子支持理论结论。
更新日期:2020-01-01
down
wechat
bug