当前位置: X-MOL 学术arXiv.cs.GT › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Online Optimization in Games via Control Theory: Connecting Regret, Passivity and Poincaré Recurrence
arXiv - CS - Computer Science and Game Theory Pub Date : 2021-06-09 , DOI: arxiv-2106.04748
Yun Kuen Cheung, Georgios Piliouras

We present a novel control-theoretic understanding of online optimization and learning in games, via the notion of passivity. Passivity is a fundamental concept in control theory, which abstracts energy conservation and dissipation in physical systems. It has become a standard tool in analysis of general feedback systems, to which game dynamics belong. Our starting point is to show that all continuous-time Follow-the-Regularized-Leader (FTRL) dynamics, which includes the well-known Replicator Dynamic, are lossless, i.e. it is passive with no energy dissipation. Interestingly, we prove that passivity implies bounded regret, connecting two fundamental primitives of control theory and online optimization. The observation of energy conservation in FTRL inspires us to present a family of lossless learning dynamics, each of which has an underlying energy function with a simple gradient structure. This family is closed under convex combination; as an immediate corollary, any convex combination of FTRL dynamics is lossless and thus has bounded regret. This allows us to extend the framework of Fox and Shamma (Games, 2013) to prove not just global asymptotic stability results for game dynamics, but Poincar\'e recurrence results as well. Intuitively, when a lossless game (e.g. graphical constant-sum game) is coupled with lossless learning dynamic, their interconnection is also lossless, which results in a pendulum-like energy-preserving recurrent behavior, generalizing the results of Piliouras and Shamma (SODA, 2014) and Mertikopoulos, Papadimitriou and Piliouras (SODA, 2018).

中文翻译:

基于控制理论的游戏在线优化:连接遗憾、被动和庞加莱重现

我们通过被动的概念对游戏中的在线优化和学习提出了一种新颖的控制理论理解。被动性是控制理论中的一个基本概念,它抽象了物理系统中的能量守恒和耗散。它已成为分析一般反馈系统的标准工具,游戏动力学属于该系统。我们的出发点是证明所有连续时间跟随正则化领导者 (FTRL) 动力学,包括著名的复制器动力学,都是无损的,即它是无源的,没有能量耗散。有趣的是,我们证明了被动性意味着有限后悔,将控制理论和在线优化的两个基本原语联系起来。FTRL 中对能量守恒的观察启发我们提出了一系列无损学习动态,每个都有一个具有简单梯度结构的潜在能量函数。这个族在凸组合下是封闭的;作为直接推论,任何 FTRL 动力学的凸组合都是无损的,因此有一定的遗憾。这使我们能够扩展 Fox 和 Shamma(Games,2013)的框架,以证明不仅是博弈动力学的全局渐近稳定性结果,而且还证明了 Poincar\'e 递归结果。直观上,当无损游戏(例如图形常数和游戏)与无损学习动态相结合时,它们的互连也是无损的,这导致钟摆式的节能循环行为,概括了 Piliouras 和 Shamma (SODA, 2014) 和 Mertikopoulos、Papadimitriou 和 Piliouras (SODA, 2018)。这个族在凸组合下是封闭的;作为直接推论,任何 FTRL 动力学的凸组合都是无损的,因此有一定的遗憾。这使我们能够扩展 Fox 和 Shamma(Games,2013)的框架,以证明不仅是博弈动力学的全局渐近稳定性结果,而且还证明了 Poincar\'e 递归结果。直观上,当无损游戏(例如图形常数和游戏)与无损学习动态相结合时,它们的互连也是无损的,这导致钟摆式的节能循环行为,概括了 Piliouras 和 Shamma (SODA, 2014) 和 Mertikopoulos、Papadimitriou 和 Piliouras (SODA, 2018)。这个族在凸组合下是封闭的;作为直接推论,任何 FTRL 动力学的凸组合都是无损的,因此有一定的遗憾。这使我们能够扩展 Fox 和 Shamma(Games,2013)的框架,以证明不仅是博弈动力学的全局渐近稳定性结果,而且还证明了 Poincar\'e 递归结果。直观上,当无损游戏(例如图形常数和游戏)与无损学习动态相结合时,它们的互连也是无损的,这导致钟摆式的节能循环行为,概括了 Piliouras 和 Shamma (SODA, 2014) 和 Mertikopoulos、Papadimitriou 和 Piliouras (SODA, 2018)。这使我们能够扩展 Fox 和 Shamma(Games,2013)的框架,以证明不仅是博弈动力学的全局渐近稳定性结果,而且还证明了 Poincar\'e 递归结果。直观上,当无损游戏(例如图形常数和游戏)与无损学习动态相结合时,它们的互连也是无损的,这导致钟摆式的节能循环行为,概括了 Piliouras 和 Shamma (SODA, 2014) 和 Mertikopoulos、Papadimitriou 和 Piliouras (SODA, 2018)。这使我们能够扩展 Fox 和 Shamma(Games,2013)的框架,以证明不仅是博弈动力学的全局渐近稳定性结果,而且还证明了 Poincar\'e 递归结果。直观上,当无损游戏(例如图形常数和游戏)与无损学习动态相结合时,它们的互连也是无损的,这导致钟摆式的节能循环行为,概括了 Piliouras 和 Shamma (SODA, 2014) 和 Mertikopoulos、Papadimitriou 和 Piliouras (SODA, 2018)。
更新日期:2021-06-10
down
wechat
bug