当前位置:
X-MOL 学术
›
arXiv.cs.NE
›
论文详情
Our official English website, www.x-mol.net, welcomes your
feedback! (Note: you will need to create a separate account there.)
A Novel DNN Training Framework via Data Sampling and Multi-Task Optimization
arXiv - CS - Neural and Evolutionary Computing Pub Date : 2020-07-02 , DOI: arxiv-2007.01016 Boyu Zhang, A. K. Qin, Hong Pan, Timos Sellis
arXiv - CS - Neural and Evolutionary Computing Pub Date : 2020-07-02 , DOI: arxiv-2007.01016 Boyu Zhang, A. K. Qin, Hong Pan, Timos Sellis
Conventional DNN training paradigms typically rely on one training set and
one validation set, obtained by partitioning an annotated dataset used for
training, namely gross training set, in a certain way. The training set is used
for training the model while the validation set is used to estimate the
generalization performance of the trained model as the training proceeds to
avoid over-fitting. There exist two major issues in this paradigm. Firstly, the
validation set may hardly guarantee an unbiased estimate of generalization
performance due to potential mismatching with test data. Secondly, training a
DNN corresponds to solve a complex optimization problem, which is prone to
getting trapped into inferior local optima and thus leads to undesired training
results. To address these issues, we propose a novel DNN training framework. It
generates multiple pairs of training and validation sets from the gross
training set via random splitting, trains a DNN model of a pre-specified
structure on each pair while making the useful knowledge (e.g., promising
network parameters) obtained from one model training process to be transferred
to other model training processes via multi-task optimization, and outputs the
best, among all trained models, which has the overall best performance across
the validation sets from all pairs. The knowledge transfer mechanism featured
in this new framework can not only enhance training effectiveness by helping
the model training process to escape from local optima but also improve on
generalization performance via implicit regularization imposed on one model
training process from other model training processes. We implement the proposed
framework, parallelize the implementation on a GPU cluster, and apply it to
train several widely used DNN models. Experimental results demonstrate the
superiority of the proposed framework over the conventional training paradigm.
中文翻译:
通过数据采样和多任务优化的新型 DNN 训练框架
传统的 DNN 训练范式通常依赖于一个训练集和一个验证集,通过以某种方式对用于训练的带注释的数据集(即总训练集)进行分区而获得。训练集用于训练模型,而验证集用于在训练过程中估计训练模型的泛化性能以避免过度拟合。这种范式存在两个主要问题。首先,由于与测试数据的潜在不匹配,验证集可能很难保证对泛化性能的无偏估计。其次,训练 DNN 对应于解决复杂的优化问题,这容易陷入劣质局部最优,从而导致不希望的训练结果。为了解决这些问题,我们提出了一种新颖的 DNN 训练框架。它通过随机拆分从总训练集生成多对训练和验证集,在每一对上训练一个预先指定结构的 DNN 模型,同时使从一个模型训练过程中获得的有用知识(例如有希望的网络参数)转化为通过多任务优化转移到其他模型训练过程,并输出所有训练模型中最好的,它在所有对的验证集上具有整体最佳性能。这种新框架中的知识转移机制不仅可以通过帮助模型训练过程摆脱局部最优来提高训练效率,而且还可以通过从其他模型训练过程中对一个模型训练过程强加的隐式正则化来提高泛化性能。我们实施提议的框架,在 GPU 集群上并行化实现,并将其应用于训练多个广泛使用的 DNN 模型。实验结果证明了所提出的框架优于传统训练范式。
更新日期:2020-07-03
中文翻译:
通过数据采样和多任务优化的新型 DNN 训练框架
传统的 DNN 训练范式通常依赖于一个训练集和一个验证集,通过以某种方式对用于训练的带注释的数据集(即总训练集)进行分区而获得。训练集用于训练模型,而验证集用于在训练过程中估计训练模型的泛化性能以避免过度拟合。这种范式存在两个主要问题。首先,由于与测试数据的潜在不匹配,验证集可能很难保证对泛化性能的无偏估计。其次,训练 DNN 对应于解决复杂的优化问题,这容易陷入劣质局部最优,从而导致不希望的训练结果。为了解决这些问题,我们提出了一种新颖的 DNN 训练框架。它通过随机拆分从总训练集生成多对训练和验证集,在每一对上训练一个预先指定结构的 DNN 模型,同时使从一个模型训练过程中获得的有用知识(例如有希望的网络参数)转化为通过多任务优化转移到其他模型训练过程,并输出所有训练模型中最好的,它在所有对的验证集上具有整体最佳性能。这种新框架中的知识转移机制不仅可以通过帮助模型训练过程摆脱局部最优来提高训练效率,而且还可以通过从其他模型训练过程中对一个模型训练过程强加的隐式正则化来提高泛化性能。我们实施提议的框架,在 GPU 集群上并行化实现,并将其应用于训练多个广泛使用的 DNN 模型。实验结果证明了所提出的框架优于传统训练范式。