Tasks, stability, architecture, and compute: Training more effective learned optimizers, and using them to train themselves,arXiv - CS - Neural and Evolutionary Computing

当前位置： X-MOL 学术 › arXiv.cs.NE › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Tasks, stability, architecture, and compute: Training more effective learned optimizers, and using them to train themselves
arXiv - CS - Neural and Evolutionary Computing Pub Date : 2020-09-23 , DOI: arxiv-2009.11243
Luke Metz, Niru Maheswaranathan, C. Daniel Freeman, Ben Poole, Jascha Sohl-Dickstein

Much as replacing hand-designed features with learned functions has revolutionized how we solve perceptual tasks, we believe learned algorithms will transform how we train models. In this work we focus on general-purpose learned optimizers capable of training a wide variety of problems with no user-specified hyperparameters. We introduce a new, neural network parameterized, hierarchical optimizer with access to additional features such as validation loss to enable automatic regularization. Most learned optimizers have been trained on only a single task, or a small number of tasks. We train our optimizers on thousands of tasks, making use of orders of magnitude more compute, resulting in optimizers that generalize better to unseen tasks. The learned optimizers not only perform well, but learn behaviors that are distinct from existing first order optimizers. For instance, they generate update steps that have implicit regularization and adapt as the problem hyperparameters (e.g. batch size) or architecture (e.g. neural network width) change. Finally, these learned optimizers show evidence of being useful for out of distribution tasks such as training themselves from scratch.

中文翻译：

任务、稳定性、架构和计算：训练更有效的学习优化器，并使用它们来训练自己

就像用学习函数取代手工设计的特征已经彻底改变了我们解决感知任务的方式一样，我们相信学习算法将改变我们训练模型的方式。在这项工作中，我们专注于通用学习优化器，能够在没有用户指定的超参数的情况下训练各种问题。我们引入了一种新的神经网络参数化分层优化器，可以访问其他功能（例如验证损失）以启用自动正则化。大多数学习优化器只接受过单个任务或少量任务的训练。我们在数千个任务上训练我们的优化器，利用更多数量级的计算，从而使优化器能够更好地泛化到看不见的任务。学习到的优化器不仅表现良好，但学习与现有一阶优化器不同的行为。例如，它们生成具有隐式正则化的更新步骤，并随着问题超参数（例如批量大小）或架构（例如神经网络宽度）的变化而适应。最后，这些学习到的优化器显示出对分布式任务有用的证据，例如从头开始训练自己。

更新日期：2020-09-24

点击分享查看原文

点击收藏

阅读更多本刊最新论文