Follow the Neurally-Perturbed Leader for Adversarial Training,arXiv - CS - Computer Science and Game Theory

当前位置： X-MOL 学术 › arXiv.cs.GT › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Follow the Neurally-Perturbed Leader for Adversarial Training
arXiv - CS - Computer Science and Game Theory Pub Date : 2020-02-16 , DOI: arxiv-2002.06476
Ari Azarafrooz

Game-theoretic models of learning are a powerful set of models that optimize multi-objective architectures. Among these models are zero-sum architectures that have inspired adversarial learning frameworks. An important shortcoming of these zeros-sum architectures is that gradient-based training leads to weak convergence and cyclic dynamics. We propose a novel follow the leader training algorithm for zeros-sum architectures that guarantees convergence to mixed Nash equilibrium without cyclic behaviors. It is a special type of follow the perturbed leader algorithm where perturbations are the result of a neural mediating agent. We validate our theoretical results by applying this training algorithm to games with convex and non-convex loss as well as generative adversarial architectures. Moreover, we customize the implementation of this algorithm for adversarial imitation learning applications. At every step of the training, the mediator agent perturbs the observations with generated codes. As a result of these mediating codes, the proposed algorithm is also efficient for learning in environments with various factors of variations. We validate our assertion by using a procedurally generated game environment as well as synthetic data. Github implementation is available.

中文翻译：

跟随神经扰动的领导者进行对抗性训练

博弈论学习模型是一组强大的模型，可以优化多目标架构。在这些模型中，有启发了对抗性学习框架的零和架构。这些零和架构的一个重要缺点是基于梯度的训练会导致收敛性和循环动态性较弱。我们为零和架构提出了一种新颖的跟随领导者训练算法，该算法保证收敛到混合纳什均衡而没有循环行为。它是一种特殊类型的跟随扰动领导算法，其中扰动是神经中介代理的结果。我们通过将此训练算法应用于具有凸面和非凸面损失以及生成对抗架构的游戏来验证我们的理论结果。而且，我们为对抗性模仿学习应用程序定制了该算法的实现。在训练的每一步，中介代理都会用生成的代码扰乱观察结果。由于这些中介代码，所提出的算法对于在具有各种变化因素的环境中学习也是有效的。我们通过使用程序生成的游戏环境以及合成数据来验证我们的断言。Github 实现可用。我们通过使用程序生成的游戏环境以及合成数据来验证我们的断言。Github 实现可用。我们通过使用程序生成的游戏环境以及合成数据来验证我们的断言。Github 实现可用。

更新日期：2020-06-09

点击分享查看原文

点击收藏

阅读更多本刊最新论文