当前位置: X-MOL 学术arXiv.cs.DC › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Gradient Forward-Propagation for Large-Scale Temporal Video Modelling
arXiv - CS - Distributed, Parallel, and Cluster Computing Pub Date : 2021-06-15 , DOI: arxiv-2106.08318
Mateusz Malinowski, Dimitrios Vytiniotis, Grzegorz Swirszcz, Viorica Patraucean, Joao Carreira

How can neural networks be trained on large-volume temporal data efficiently? To compute the gradients required to update parameters, backpropagation blocks computations until the forward and backward passes are completed. For temporal signals, this introduces high latency and hinders real-time learning. It also creates a coupling between consecutive layers, which limits model parallelism and increases memory consumption. In this paper, we build upon Sideways, which avoids blocking by propagating approximate gradients forward in time, and we propose mechanisms for temporal integration of information based on different variants of skip connections. We also show how to decouple computation and delegate individual neural modules to different devices, allowing distributed and parallel training. The proposed Skip-Sideways achieves low latency training, model parallelism, and, importantly, is capable of extracting temporal features, leading to more stable training and improved performance on real-world action recognition video datasets such as HMDB51, UCF101, and the large-scale Kinetics-600. Finally, we also show that models trained with Skip-Sideways generate better future frames than Sideways models, and hence they can better utilize motion cues.

中文翻译:

用于大规模时间视频建模的梯度前向传播

如何有效地在大量时间数据上训练神经网络?为了计算更新参数所需的梯度,反向传播会阻止计算,直到前向和后向传播完成。对于时间信号,这会引入高延迟并阻碍实时学习。它还在连续层之间创建了耦合,从而限制了模型并行性并增加了内存消耗。在本文中,我们建立在 Sideways 的基础上,它通过及时向前传播近似梯度来避免阻塞,并且我们提出了基于跳跃连接的不同变体的信息时间整合机制。我们还展示了如何解耦计算并将单个神经模块委托给不同的设备,从而允许分布式和并行训练。提议的 Skip-Sideways 实现了低延迟训练,模型并行性,并且重要的是,能够提取时间特征,从而在真实世界的动作识别视频数据集(如 HMDB51、UCF101 和大规模 Kinetics-600)上实现更稳定的训练和改进的性能。最后,我们还展示了使用 Skip-Sideways 训练的模型比 Sideways 模型生成更好的未来帧,因此它们可以更好地利用运动线索。
更新日期:2021-06-16
down
wechat
bug