Model-Free Learning of Optimal Ergodic Policies in Wireless Systems,IEEE Transactions on Signal Processing

当前位置： X-MOL 学术 › IEEE Trans. Signal Process. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Model-Free Learning of Optimal Ergodic Policies in Wireless Systems
IEEE Transactions on Signal Processing ( IF 4.6 ) Pub Date : 2020-01-01 , DOI: 10.1109/tsp.2020.3030073
Dionysios S. Kalogerias , Mark Eisen , George J. Pappas , Alejandro Ribeiro

Learning optimal resource allocation policies in wireless systems can be effectively achieved by formulating finite dimensional constrained programs which depend on system configuration, as well as the adopted learning parameterization. The interest here is in cases where system models are unavailable, prompting methods that probe the wireless system with candidate policies, and then use observed performance to determine better policies. This generic procedure is difficult because of the need to cull accurate gradient estimates out of these limited system queries. This article constructs and exploits smoothed surrogates of constrained ergodic resource allocation problems, the gradients of the former being representable exactly as averages of finite differences that can be obtained through limited system probing. Leveraging this unique property, we develop a new model-free primal-dual algorithm for learning optimal ergodic resource allocations, while we rigorously analyze the relationships between original policy search problems and their surrogates, in both primal and dual domains. First, we show that both primal and dual domain surrogates are uniformly consistent approximations of their corresponding original finite dimensional counterparts. Upon further assuming the use of near-universal policy parameterizations, we also develop explicit bounds on the gap between optimal values of initial, infinite dimensional resource allocation problems, and dual values of their parameterized smoothed surrogates. In fact, we show that this duality gap decreases at a linear rate relative to smoothing and universality parameters. Thus, it can be made arbitrarily small at will, also justifying our proposed primal-dual algorithmic recipe. Numerical simulations confirm the effectiveness of our approach.

中文翻译：

无线系统中最优遍历策略的无模型学习

通过制定依赖于系统配置的有限维约束程序以及采用的学习参数化，可以有效地学习无线系统中的最优资源分配策略。这里的兴趣是在系统模型不可用的情况下，提示使用候选策略探测无线系统的方法，然后使用观察到的性能来确定更好的策略。由于需要从这些有限的系统查询中剔除准确的梯度估计，因此这种通用过程很困难。本文构建并利用了受约束遍历资源分配问题的平滑代理，前者的梯度可以精确地表示为可以通过有限系统探测获得的有限差分的平均值。利用这个独特的属性，我们开发了一种新的无模型原始对偶算法，用于学习最佳遍历资源分配，同时我们在原始和对偶域中严格分析原始策略搜索问题及其代理之间的关系。首先，我们表明原始域代理和双域代理都是其对应的原始有限维对应项的一致一致近似值。在进一步假设使用近乎通用的策略参数化后，我们还对初始无限维资源分配问题的最优值与其参数化平滑代理的对偶值之间的差距制定了明确的界限。事实上，我们表明这种二元性差距相对于平滑和普遍性参数以线性速率减小。因此，它可以随意变小，也证明了我们提出的原始对偶算法配方的合理性。数值模拟证实了我们方法的有效性。

更新日期：2020-01-01

点击分享查看原文

点击收藏

公开下载

阅读更多本刊最新论文本刊介绍/投稿指南11