A Novel DNN Training Framework via Data Sampling and Multi-Task Optimization,arXiv - CS - Neural and Evolutionary Computing

当前位置： X-MOL 学术 › arXiv.cs.NE › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

A Novel DNN Training Framework via Data Sampling and Multi-Task Optimization
arXiv - CS - Neural and Evolutionary Computing Pub Date : 2020-07-02 , DOI: arxiv-2007.01016
Boyu Zhang, A. K. Qin, Hong Pan, Timos Sellis

Conventional DNN training paradigms typically rely on one training set and one validation set, obtained by partitioning an annotated dataset used for training, namely gross training set, in a certain way. The training set is used for training the model while the validation set is used to estimate the generalization performance of the trained model as the training proceeds to avoid over-fitting. There exist two major issues in this paradigm. Firstly, the validation set may hardly guarantee an unbiased estimate of generalization performance due to potential mismatching with test data. Secondly, training a DNN corresponds to solve a complex optimization problem, which is prone to getting trapped into inferior local optima and thus leads to undesired training results. To address these issues, we propose a novel DNN training framework. It generates multiple pairs of training and validation sets from the gross training set via random splitting, trains a DNN model of a pre-specified structure on each pair while making the useful knowledge (e.g., promising network parameters) obtained from one model training process to be transferred to other model training processes via multi-task optimization, and outputs the best, among all trained models, which has the overall best performance across the validation sets from all pairs. The knowledge transfer mechanism featured in this new framework can not only enhance training effectiveness by helping the model training process to escape from local optima but also improve on generalization performance via implicit regularization imposed on one model training process from other model training processes. We implement the proposed framework, parallelize the implementation on a GPU cluster, and apply it to train several widely used DNN models. Experimental results demonstrate the superiority of the proposed framework over the conventional training paradigm.

中文翻译：

通过数据采样和多任务优化的新型 DNN 训练框架

传统的 DNN 训练范式通常依赖于一个训练集和一个验证集，通过以某种方式对用于训练的带注释的数据集（即总训练集）进行分区而获得。训练集用于训练模型，而验证集用于在训练过程中估计训练模型的泛化性能以避免过度拟合。这种范式存在两个主要问题。首先，由于与测试数据的潜在不匹配，验证集可能很难保证对泛化性能的无偏估计。其次，训练 DNN 对应于解决复杂的优化问题，这容易陷入劣质局部最优，从而导致不希望的训练结果。为了解决这些问题，我们提出了一种新颖的 DNN 训练框架。它通过随机拆分从总训练集生成多对训练和验证集，在每一对上训练一个预先指定结构的 DNN 模型，同时使从一个模型训练过程中获得的有用知识（例如有希望的网络参数）转化为通过多任务优化转移到其他模型训练过程，并输出所有训练模型中最好的，它在所有对的验证集上具有整体最佳性能。这种新框架中的知识转移机制不仅可以通过帮助模型训练过程摆脱局部最优来提高训练效率，而且还可以通过从其他模型训练过程中对一个模型训练过程强加的隐式正则化来提高泛化性能。我们实施提议的框架，在 GPU 集群上并行化实现，并将其应用于训练多个广泛使用的 DNN 模型。实验结果证明了所提出的框架优于传统训练范式。

更新日期：2020-07-03

点击分享查看原文

点击收藏

阅读更多本刊最新论文