当前位置: X-MOL 学术IEEE Trans. Parallel Distrib. Syst. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Data Life Aware Model Updating Strategy for Stream-Based Online Deep Learning
IEEE Transactions on Parallel and Distributed Systems ( IF 5.6 ) Pub Date : 2021-04-08 , DOI: 10.1109/tpds.2021.3071939
Wei Rang , Donglin Yang , Dazhao Cheng , Yu Wang

Many deep learning applications deployed in dynamic environments change over time, in which the training models are supposed to be continuously updated with streaming data to guarantee better descriptions of data trends. However, most state-of-the-art learning frameworks support well in offline training methods while omitting online model updating strategies. In this work, we propose and implement iDlaLayer, a thin middleware layer on top of existing training frameworks that streamlines the support and implementation of online deep learning applications. In pursuit of good model quality and fast data incorporation, we design a Data Life Aware model updating strategy (DLA), which builds training data samples according to contributions of data from different life stages, and considers the training cost consumed in model updating. We evaluate iDlaLayer's performance through simulations and experiments based on TensorflowOnSpark with three representative online learning workloads. Our experimental results demonstrate that iDlaLayer reduces the overall elapsed time of ResNet, DeepFM and PageRank by 11.3, 28.2, and 15.2 percent compared to the periodic update strategy, respectively. It further achieves an average 20 percent decrease in training cost and brings about a 5 percent improvement in model quality against the traditional continuous training method.

中文翻译:


基于流的在线深度学习的数据生命感知模型更新策略



许多部署在动态环境中的深度学习应用程序会随着时间的推移而变化,其中训练模型应该随着流数据不断更新,以保证更好地描述数据趋势。然而,大多数最先进的学习框架都很好地支持离线训练方法,而忽略了在线模型更新策略。在这项工作中,我们提出并实现了 iDlaLayer,这是一个位于现有训练框架之上的薄中间件层,可简化在线深度学习应用程序的支持和实现。为了追求良好的模型质量和快速的数据融合,我们设计了数据生命感知模型更新策略(DLA),根据不同生命阶段数据的贡献构建训练数据样本,并考虑模型更新所消耗的训练成本。我们通过基于 TensorflowOnSpark 的模拟和实验以及三个具有代表性的在线学习工作负载来评估 iDlaLayer 的性能。我们的实验结果表明,与定期更新策略相比,iDlaLayer 将 ResNet、DeepFM 和 PageRank 的总体运行时间分别减少了 11.3%、28.2% 和 15.2%。与传统持续训练方法相比,进一步实现了训练成本平均降低 20%,模型质量提高 5%。
更新日期:2021-04-08
down
wechat
bug