当前位置:
X-MOL 学术
›
arXiv.cs.NE
›
论文详情
Our official English website, www.x-mol.net, welcomes your
feedback! (Note: you will need to create a separate account there.)
Tasks, stability, architecture, and compute: Training more effective learned optimizers, and using them to train themselves
arXiv - CS - Neural and Evolutionary Computing Pub Date : 2020-09-23 , DOI: arxiv-2009.11243 Luke Metz, Niru Maheswaranathan, C. Daniel Freeman, Ben Poole, Jascha Sohl-Dickstein
arXiv - CS - Neural and Evolutionary Computing Pub Date : 2020-09-23 , DOI: arxiv-2009.11243 Luke Metz, Niru Maheswaranathan, C. Daniel Freeman, Ben Poole, Jascha Sohl-Dickstein
Much as replacing hand-designed features with learned functions has
revolutionized how we solve perceptual tasks, we believe learned algorithms
will transform how we train models. In this work we focus on general-purpose
learned optimizers capable of training a wide variety of problems with no
user-specified hyperparameters. We introduce a new, neural network
parameterized, hierarchical optimizer with access to additional features such
as validation loss to enable automatic regularization. Most learned optimizers
have been trained on only a single task, or a small number of tasks. We train
our optimizers on thousands of tasks, making use of orders of magnitude more
compute, resulting in optimizers that generalize better to unseen tasks. The
learned optimizers not only perform well, but learn behaviors that are distinct
from existing first order optimizers. For instance, they generate update steps
that have implicit regularization and adapt as the problem hyperparameters
(e.g. batch size) or architecture (e.g. neural network width) change. Finally,
these learned optimizers show evidence of being useful for out of distribution
tasks such as training themselves from scratch.
中文翻译:
任务、稳定性、架构和计算:训练更有效的学习优化器,并使用它们来训练自己
就像用学习函数取代手工设计的特征已经彻底改变了我们解决感知任务的方式一样,我们相信学习算法将改变我们训练模型的方式。在这项工作中,我们专注于通用学习优化器,能够在没有用户指定的超参数的情况下训练各种问题。我们引入了一种新的神经网络参数化分层优化器,可以访问其他功能(例如验证损失)以启用自动正则化。大多数学习优化器只接受过单个任务或少量任务的训练。我们在数千个任务上训练我们的优化器,利用更多数量级的计算,从而使优化器能够更好地泛化到看不见的任务。学习到的优化器不仅表现良好,但学习与现有一阶优化器不同的行为。例如,它们生成具有隐式正则化的更新步骤,并随着问题超参数(例如批量大小)或架构(例如神经网络宽度)的变化而适应。最后,这些学习到的优化器显示出对分布式任务有用的证据,例如从头开始训练自己。
更新日期:2020-09-24
中文翻译:
任务、稳定性、架构和计算:训练更有效的学习优化器,并使用它们来训练自己
就像用学习函数取代手工设计的特征已经彻底改变了我们解决感知任务的方式一样,我们相信学习算法将改变我们训练模型的方式。在这项工作中,我们专注于通用学习优化器,能够在没有用户指定的超参数的情况下训练各种问题。我们引入了一种新的神经网络参数化分层优化器,可以访问其他功能(例如验证损失)以启用自动正则化。大多数学习优化器只接受过单个任务或少量任务的训练。我们在数千个任务上训练我们的优化器,利用更多数量级的计算,从而使优化器能够更好地泛化到看不见的任务。学习到的优化器不仅表现良好,但学习与现有一阶优化器不同的行为。例如,它们生成具有隐式正则化的更新步骤,并随着问题超参数(例如批量大小)或架构(例如神经网络宽度)的变化而适应。最后,这些学习到的优化器显示出对分布式任务有用的证据,例如从头开始训练自己。